You are on page 1of 38

NetIQ AppManager Implementation Blueprint

Version 3.0

Contents
SCOPE ....................................................... 1 METHODOLOGY OVERVIEW ......................... 1 CRITICAL SUCCESS FACTORS ..................... 3 THE ASSESS PHASE ................................... 4 THE PLAN PHASE ....................................... 9 THE DESIGN PHASE ................................. 12 THE DEPLOY PHASE ................................. 25 THE M ANAGE PHASE ................................ 28 APPENDIX A – SAMPLE DOCUMENTS ........ 29 Requirements Definition Document ....... 29 Inventory Document ............................... 29 Management Strategy & Vision ............. 30 Standard Monitoring Policy .................... 32 Additional Reading ................................. 35 Sample Project Plan ............................... 36

A Services Guidebook for Success March 2008
NetIQ’s AppManager product is arguably the most straightforward and easy to use service management software on the market today. AppManager’s key strengths include its depth and breadth of systems and application monitoring, its detailed and managementfriendly reporting and its ease of installation and configuration. While AppManager is easy to install and operate, it is still a mature Enterprise application which is often being installed to help solve very complex business problems, simplify or automate technical functions, or to provide service management to very demanding business and technical customers. To ensure long term success, your AppManager implementation needs to be treated as a full Enterprise application implementation project rather than a casual software installation. This does not mean that your AppManager implementation will be an on-going, never-ending project, but it does mean that proper planning is required to be successful. A typical AppManager implementation can range anywhere from three to twelve weeks in duration. A good rule of thumb is to allow at least two weeks for proper planning and assessment and then two weeks per one hundred agents to be deployed. The purpose of this Blueprint is not to provide a stepby-step installation guide, but rather to provide common guidelines and an easy-to-follow implementation methodology that the AppManager project team can follow to help ensure a successful implementation.

S CO P E
The purpose of the NetIQ AppManager Implementation Blueprint is to provide a standard methodology for implementing NetIQ AppManager into your environment. Based on experience drawn from a multitude of successful implementations by NetIQ’s Global Services organization as well as industry standard best practices (ITIL and COBIT), this guide was designed specifically to help customers effectively and efficiently implement AppManager to its fullest potential. Having a firm understanding of the requirements, activities and deliverables associated which each phase of an AppManager implementation project prior to starting the project will greatly enhance the probability of success in the short term. It will also enable the solution to continually provide substantial value to the business and IT customers that will rely upon it over the long term.

M E T H OD O L O GY O V E RV I E W
The implementation methodology is broken into five high level phases. The phases are designed to be run in sequence as the outputs from one phase are required inputs into the next phase.

NOTE: You may find that you have completed some of the phases listed above. Please validate that you have all the inputs required as listed in each phase description in this document.

Phase: Assess Description: The first activity in any project exercise is to evaluate the existing environment and understand the design goals of the AppManager solution within the customer enterprise. The majority of this information must come from your IT staff and existing documentation. This section describes the information that the project team needs before they can complete the Plan, Design, Deploy and Manage phases of the project.

-1-

Phase: Plan Description: This phase is focused on defining and validating the project objectives and key success criteria. It also includes the development of a comprehensive project plan that includes all major project milestones, resources and project responsibilities across the different IT related competencies in your organization. Phase: Design Description: The design phase of the NetIQ methodology examines three key areas: people, process and technology. The project requirements and knowledge of the organization are used to determine the required architecture. A solution design specification is developed that includes the physical, logical and security design for the proposed architecture, integration and monitoring requirements. The project plan is updated based on the new design. Phase: Deploy Description: This phase includes all key activities involved in moving into full production deployment. It involves building the environment, installation of core software, initial pilot of agents, and deployment of monitoring policies, reporting and alerting. It includes post-pilot review and optimization and development of a production deployment plan prior to execution of this plan. It also includes development of all production operational procedures and documentation. Phase: Manage Description: This phase of the project should be repeated regularly to ensure optimization and alignment with best practices. It involves a review of new technical, business and functional requirements to determine required system changes. It is recommended that an initial review of the deployed solution be conducted no less than thirty (30) and no more that sixty (60) days after the go live date to ensure that the solution is meeting the initial requirements of the project. Beyond the initial review, it is recommended that a regular review be conducted at least every six (6) months. Project Management and Knowledge Transfer Project management and knowledge transfer should be consistent throughout the entire lifecycle of the project. Project management is required to ensure that team members remain focused on assigned activities and to communicate any changes in timelines or activities to appropriate personnel.

-2-

C R I T I C AL S U C C E S S F AC TO R S Critical Success Factors (CSFs) are those few actions, outcomes or conditions that contribute most directly to the success of the project. CSFs can be categorized into a number of different areas including project sponsorship, project management and project staffing and delivery. Ultimately, it is you (the customer) who owns responsibility for the successful implementation of your NetIQ investment. This responsibility and accountability can be shared across key project resources including your own internal project team, outside consultants and/or NetIQ Global Services consultants, but it is critical that there is a clear project vision and Executive sponsorship for the project. The critical success factors for a NetIQ AppManager implementation are as follows:  Project Vision Having a clearly defined vision that all team members are in agreement with is vitally important. A clearly defined vision helps keep the project team focused on the overall end goal rather than getting consumed with individual tasks.  Management and customer involvement Management commitment and customer involvement are key to the success of any major project. Because the implementation of a Service Management (which includes Systems, Network and Enterprise Management) solution typically involves changes to the way that IT currently operates, it is important for senior level management to be fully committed to the success of the project. Your active involvement is also critical. You will have the final say as to the success or failure of the solution and therefore, it is critical that requirements be validated at the beginning of each and every phase of the project.  Project Organization An effective project organization structure is critical to the success of any project. Clearly defined roles, activities and responsibilities will help ensure that team members focus on the tasks that they have responsibility for and also help ensure that “turf wars” do not crop up. In smaller organizations, it is entirely possible that the same person may have multiple roles and responsibilities. This is fine as long as that individual understands the differences in the roles and budgets his or her time accordingly.  Experience in Systems or Network Management Implementations Enterprise Management, Systems Management and Network Management are all very similar disciplines. Previous experience in any one of these three areas is essential as the processes and activities associated with good infrastructure management are sometimes hard to grasp. A good Systems Management engineer has a unique ability to analyze and set up monitoring of an IT environment from both a bottom (Infrastructure) up perspective as well as from a top (Customer) down perspective.

-3-

 Knowledge of NetIQ AppManager Software NetIQ AppManager is easy to install and operate, but is a very mature enterprise application. Because of this, a thorough understanding of how the product works is essential. Depending on the size of the overall deployment, knowledge gained from AppManager training courses, while helpful, may not be sufficient to help ensure a successful implementation. At least one member of the project team should have previous experience with AppManager in an environment similar to or larger than the size of your planned environment. If this is not true of anyone on your current project team, NetIQ Partners and / or Global Services organizations have teams to assist.  Knowledge of core IT operations Processes Knowledge of key IT operations processes is critical to the success of your project. Configuring the robust monitoring, reporting and alerting capabilities of NetIQ AppManager requires many decisions about how IT operates both today and how IT will operate in the future. These decisions need to be made in conjunction with the overall IT and business strategies. Taking into account the above Critical Success Factors, let’s look at the individual phases in detail.

T HE A S S E S S P H AS E
The Assess Phase of the methodology is concerned with defining your current business, technical and functional requirements (“To be” or “What I want”); as well as taking an inventory of and evaluating the customers’ current environment (“As is” or “What I have”). The two outputs are compared and any difference between the two is documented in a Gap Analysis document. At the end of the exercise, you should possess several key pieces of information:    What we want What we have What are the gaps?

It is important to note that requirements and gaps do not necessarily need to be technical in nature. Many requirements in a Service Management solution deal with people and processes as well as technology. For example, a particular requirement may be that critical outage notifications are acknowledged by support teams within ten minutes. This is a requirement that can only be addressed by the combination of people, process and technology. Technology merely provides the trigger. Some key contributing factors to undertaking a Service Management initiative may include:  Need to better assure business service availability and reliability  Need to better assure regulatory or statutory compliance  Better partner with business customers in the delivery of IT-based services

INPUTS:
Much of the data that you need to collect should already be available. These may include:  Corporate visions, strategies, objectives, policies and plans  Management and Sr. management goals and objectives  Business visions, strategies, objectives and plans  Organizational Charts  Request for Proposals (RFPs)

-4-

Existing IT infrastructure strategies - IT Strategies and plans - Overall IT architecture - Management architecture and framework Service Management visions, strategies, objectives and plans - SLM plans, service catalogs - Financial plan - Capacity plan - Availability plan - IT Business continuity plan (disaster recovery) - Security policies, handbooks and plans Current IT environment - List of current applications in use - Current processes and procedures - Current staff knowledge and skills - IT Business plan

Aside from basic data collection and analysis, it is imperative that you gain an understanding of your customers’ wants and needs. The easiest way to accomplish this is to first identify and then interview key stakeholders. More often than not, you will find that your customers are much savvier than they are usually given credit for when it comes to understanding the complexities of and the demand placed on the IT infrastructure.  Interviews - Management and Sr. Management  “When your boss calls you, what information is he\she asking for?”  “What information do you need to do your job effectively?”  “What percentage of time are you spending reacting to problems or fire-fighting?”  “How do you justify expenses?” IT Customers  “How do you use IT services in your daily job?”  “Is IT in your company a good investment?”  “Is IT doing a good job?” IT Operations  “When your boss calls you, what information is he\she asking for?”  “What information do you need to do your job effectively?”  “What percentage of time do you spend reacting to problems or fire-fighting?”

-

ACTIVITIES:
Activity 1: Gather\Define or Validate Requirements  Usability Requirements The main purpose of usability requirements is to ensure that the solution meets the expectations of its customer with regards to how easy the product is to use. Examples of usability requirements include: - System must be accessible via standard web browser software - System should be able to perform all required functions utilizing “out-of-the-box” functionality (no custom development required) - Reports should be accessible through standard web browser and\or deliverable through company email platform.

-5-

Technical Requirements Used to define requirements or constraints on the IT system. Most importantly, technical requirements will serve as a key driver in the development of the solution architecture. Examples of technical requirements are as follows: - Ease of installation. Can the product be installed easily with little to no custom coding? - Compatibility. Is the system compatible with our standard software\hardware infrastructure? - Security. Will the application support your current security model? - Availability. What are the required availability levels? - Inter-operability. Will the system interface properly with other established systems? - Ease of Maintenance. How are patches applied? Can the application be adjusted on the fly? - Capacity and Performance. How large does the system need to scale? How quickly can it generate required data? Functional Requirements Describe the things that the application is intended to do. These can be expressed as services that the application is intended to provide or tasks that it needs to perform. Examples of Functional requirements related to Service or Systems management include: - System must be able to monitor the availability of Windows and Unix servers - System must be able to capture and report against server and application performance data - System must be able to monitor our standard infrastructure applications (Exchange, AD, etc…) - System must be able to monitor the performance and availability of our line of business applications (Oracle, Weblogix, etc…)

Activity 2: Gather Inventory  Security Architecture - How are server domains and trusts currently defined? (Although you can monitor across trusted and un-trusted domains, you should collect information about domains and trusts for installation purposes.) - Which accounts have Administrator privileges in which domains? - Firewall locations and configuration information - Access control – Who needs access to the system? Who is allowed to make changes to deployed monitoring? Who needs to be able to update alerts and view reports? Network Architecture Information about your network topology and its limitations is likely to affect how you distribute AppManager components. For example, you may decide, based on this information, to create two regional management sites instead of one centralized site for network efficiency. This information is also useful for determining how best to group computers for remote agent installation. - Local area network (LAN) architecture for each site to be managed - Wide area network (WAN) architecture diagrams (if AppManager is to operate across WAN links) and information regarding line speeds and available bandwidth - LAN/WAN utilization figures - Firewall locations and configuration information - Domain and Active Directory architecture - VPN Access - DMZ and internet facing devices to be monitored

-6-

Key Business Applications Understanding the key applications in your business is a vital part of providing service management detail. Attention should be focused on applications that fall into any of the following categories: - The application participates directly with the company’s ability to make sales or generate revenue. Questions to ask: Will the company still be able to make money if this application is unavailable? A perfect example of this is a web-based online store. A certain percentage of the company’s revenue as well as its customer-facing image may be tied to that web site. - The application participates directly with the company’s ability to ship or deliver its products and or services. Questions to ask – Can the companies products and\or services still be delivered to customers on time if this application is unavailable? - If the application is unavailable, the company risks not meeting regulatory or statutory compliance regulations such as Sarbanes-Oxley, HIPPA or 21 CFR Part 11. - The application helps to enable the company to carry out its day-to-day business. A prime example of this is a company’s messaging service. Chances are if email is unavailable, the company will still be able to sell, manufacture and distribute product. However, the way that internal business is conducted will be drastically impaired. Server Architecture - Number of Windows and UNIX servers to be monitored? - Number and type of application servers (such as the number of Exchange, Domino, or Oracle RDBMS servers) to be monitored? - Number of hardware types to be monitored? Note that a single computer may fall into more than one category. For example, you may have a Dell server with Windows Server 2003 and Exchange Server installed on it. As part of your planning you need to decide, for example, if you will monitor Windows, Exchange and hardware on that computer or just Windows because this decision may affect how you size the repository or how you decide to distribute AppManager components. Current Management Solutions in place It is important to determine whether any systems management software is in use within the enterprise. If it involves another vendor, then it may be necessary to examine some or all of the following areas: - What servers and applications are currently being monitored? - Are there problems or gaps in the monitoring? - How does the current solution handle reporting? - Will the company save money by retiring the existing application? - Will AppManager replace the existing monitoring solution? - Will the AppManager agent run on servers that are running agents for other monitoring solutions? - Does AppManager have to report into an existing framework solution? Operational Processes What operational processes exist today? How does the IT department perform its day-to-day activities? - Is there an established change management process? - Is there a formal Service or Helpdesk function? - Where are events sent for resolution? - What is the process for responding to service and system outages? - How are new services and systems brought into the production environment? - How are problems escalated through the organization?

-7-

Current Skill sets - AppManager specific expertise: Is there existing NetIQ AppManager expertise in the organization? How in depth is that expertise? Can we leverage the existing expertise as a resource? - Systems management-specific expertise - Is there existing expertise in Systems or network management? How in depth is that expertise? Can we leverage the existing expertise as a resource?

Activity 3: Perform Gap Analysis The gap analysis documents the difference between desired functionality and the functionality that exists in the current solution. The gap should be documented to describe the nature of the gap and its relationship to the new solution. Gaps can fall into any one of the following three categories:  People - Staffing Levels. Are there too few or too many people in the organization? - Education Levels. Is training required? - Roles and Responsibilities. Do we have the right level person for the job?  Process - Processes completely missing - Processes not clearly defined - Processes are too complicated  Technology - Do we have the right technology in place? - Are we fully utilizing existing technology?

Once you’ve identified gaps, it is important to document the individual gaps along with a brief statement about how the gap can be remedied. The remedial action may involve more than one functional area. For example, a gap may be that critical events are not being responded to in a timely manner. The remediation steps for this may include:  People. Ensure that all support staff are properly trained on the call escalation process. Base employee bonus on their average incident response time.  Process. Re-design the Call escalation process to include management notification if the initial incident is not acknowledged within ten minutes.  Technology. Ensure that the monitoring solution can suppress duplicate events for a period of ten minutes.

DELIVERABLES:
Requirements Document  Business Requirements  Technical Requirements  Functional Requirements Inventory Document  Security Architecture  Network Architecture  Server Architecture  Current Management Solutions in place  Operational Processes  Current Skill sets

-8-

Gap Analysis Document This document should outline the identified gaps along with possible gap resolutions. There are several methods for resolving gaps:  Modifying business process design, including the incorporation of manual procedures  Developing an interface to other systems that provide the functionality  Developing system enhancements (Customizations)  Better leveraging existing technology  Training for systems and processes

T H E P L AN P H AS E
The Plan Phase of the methodology is concerned with defining the overall vision and direction of the project. It establishes timelines, cost estimates, resource requirements and most importantly, success criteria for the project.

INPUTS:
    Requirements Document Inventory Document Gap Analysis Organizational Charts

ACTIVITIES:
Activity 1: Identify Resources, Roles and Responsibilities The following roles are designed as a baseline. Depending on your organization, there may be additional roles required such as a Technical Writer. It is entirely possible that a single person will fall into multiple roles. An example of this may be the AppManager Engineer who will be the technical owner of the solution. In many cases, this person is also a member of the Server Engineering or Operations team.  Project manager - Overall project delivery and communications  Business Owner - Sr. Management project sponsor  AppManager Architect - System design and implementation  AppManager Engineer - Technical Owner of the solution - Ensuring the performance and availability of the completed solution  Sizing and Capacity Management  Solution continues to meet established service levels - Provides level two troubleshooting and support - Owns the technical relationship with the vendor  AppManager Administrator - Day-to-day operation of the solution - Level one troubleshooting and support  Windows Systems Engineer - Domain structure and trust relationships - Administrator passwords for all computers where AppManager components are to be installed - Ability to create and modify user accounts - Build of Windows Infrastructure components for AM

-9-

 

Network Engineer - Identify Network bandwidth constraints - Configure\Add entries for DNS\WINS SQL DBA - System administrator or local administrator privileges for the AppManager repository server - Knowledge of SQL Server login IDs and users with permission to access system tables - Experience with SQL Server security modes - Ability to evaluate the hardware configuration for the computer that will serve as the AppManager repository server - Knowledge of SQL Server scheduled tasks - Understanding of ongoing database maintenance, such as backup / restore and consistency checking - Application Specific Knowledge - Varies depending on the applications monitored. Examples may include Exchange Administrators, Active directory administrators, and Oracle DBA’s. These individuals should be leveraged for their in-depth knowledge of the components and applications that need to be monitored in your environment.

Activity 2: Define Project Vision Having a clearly defined vision that all team members are in agreement with is vitally important. A clearly defined vision helps keep the project team focused on the overall end goal rather than getting caught up in individual tasks.  Questions to ask of the project team: - What are the overall business objectives? - What are the driving factors for the implementation of a Service\Systems Management Solution? - What is the company’s experience with Service and Systems management to date? - What is the overall vision for the deployment and operations? - Why was AppManager chosen? Once the vision is crafted, it is important to review it regularly to ensure that all team members remain focused. Activity 3: Define Project Scope Very early on in the project it is important to clearly develop the overall scope of the project. Is this project going to be a massive undertaking to implement complete end to end monitoring across a very large global IT infrastructure for a Fortune 100 organization or will it be focused implementing base server monitoring for a mid-sized law firm? Understanding the scope is important in determining overall timelines and resource requirements. The project scope determines how far and wide the project reaches and includes the following tasks:  Design a preliminary AppManager architecture  Develop a high-level implementation strategy  Plan for integration with other systems  How many systems or devices will be monitored?  What reporting needs to be deployed?

- 10 -

Activity 4: Define Project Plan including major activities, milestones, schedules and assignments A well-defined project plan is vital to the success of the project. The project plan serves as a roadmap for the rest of the project by documenting the individual activities, durations of those activities as well as who is responsible for completing the activity. It is important to note that it is acceptable for the project plan to be updated as the project moves forward. Timelines and deliverable dates should be adjusted to reflect current status. Many factors can influence the timeline including delays in receiving hardware and software, staff availability and changes to business conditions and priorities. For a complete deployment, a good initial guideline is to allow two (2) weeks of implementation time per 100 agents to be deployed. This should be in addition to time spent on the planning and design of the solution You can build your initial project plan by taking the phases and activities from this guidebook and importing them into your project management software.

Project phase Planning

Average Time 5 to 15 days

Factors that may increase time needed      Delays in collecting the necessary information. Availability of key personnel or difficulties in the approval process. Difficulty identifying who needs to be notified of events and how notification should be handled. Difficulty producing required documentation (for example, the time it takes to document the project plan or the scope of the project). Project team availability Access to a testing laboratory. Permission to create user or database accounts. Training for team members. Changes to management strategy (for example, jobs and event handling). Changing system standards Access to the pilot group of servers. Problems with domain and trust relationships. Problems with network name resolution. Refinement of management strategy. AppManager configuration issues. For example, if an organization has grown dramatically or in unexpected ways, the project team may need to reevaluate decisions such as a centralized management site versus multiple decentralized sites. Change Management delays

Design

5 to 10 days

    

Deploy – Pilot Deployment

1 to 4 weeks

    

- 11 -

Deploy – Production deployment

1 to 10 weeks

     

  Manage Ongoing   

Physical location of servers. Department policies and procedures. Problems with domain and trust relationships. Permission to create user or database accounts. Bandwidth or latency issues to be resolved. AppManager configuration issues. For example, if an organization has grown dramatically or in unexpected ways, the project team may need to reevaluate decisions such as a centralized management site versus multiple decentralized sites. Inadequate testing of full solution Change Management delays Availability of resources Knowledge of deployed solution Lack of communication between stakeholders

DELIVERABLES:
   Project Vision and Scope Statement Project Roles and Responsibilities Matrix Project Plan

T H E D E S I G N P H AS E
The Design Phase is focused on putting together the logical and physical design for the both the AppManager infrastructure and the monitors that will be deployed.

INPUTS:
    Requirements Document from Assessment phase Inventory from Assessment Phase Project Scope from Planning Phase Project Vision from Planning Phase

ACTIVITIES:
Activity 1: Project Team Education\Training Before any design work can be completed, it is important that the project team fully understands the NetIQ AppManager components that will be deployed. If AppManager is a new product to the organization, it is strongly advised that key members of the project team attend formal product training. If for some reason, acquiring training is not an option, it may be advisable to seek the assistance of an outside consultant to assist with the designing and initial implementation of your AppManager infrastructure.

- 12 -

NetIQ offers both Public and on-site training for the AppManager product line. The education offerings are as follows:  AppManager Essentials  AppManager Advanced Monitoring and Troubleshooting. (Includes Control Center) Complete details about each of these classes is available online at http://www.netiq.com/training/catalog/default.asp?topic=am&type=instructor-onsite In addition to standard training, custom training agendas may be constructed based on customer needs, location, etc… Activity 2: AppManager Logical Infrastructure AppManager consists of required and optional components that can be installed together on a single computer or installed separately, in virtually any combination, on multiple computers. Each component has specific system requirements, so in the design process you should try to determine where you want to install which components. It is especially important to consider the number and placement of management sites. A management site always consists of one AppManager repository and at least one AppManager management server, regardless of where these components are installed. A management site may have multiple management servers to distribute processing and communication for managed clients, but each management site has exactly one repository and each management server has only one repository it communicates with. Each managed client needs at least one authorized management server for installing and discovering the AppManager agent. In small to mid-size deployments, a single management server may be sufficient. However, in many organizations it is beneficial to install multiple management servers. If you install multiple management servers within a given management site (that is, for a single repository), you must explicitly designate a primary and secondary management server for each managed client to distribute processing, provide failover support, and control which management servers communicate with which agents. The managed clients always communicate with only one management server (and, therefore, one repository) at any time within a management site. In planning where to install components and how many management servers and management sites you will need, there are several issues to consider:   When you set AppManager repository preferences for event and data handling, these preferences serve as the defaults for the site (they can be overridden on a job-by-job basis). If you are monitoring a large number of managed clients and you have the capability to stagger or schedule communication between managed clients and the management server, you may want to use a single centralized site. If you have a widely distributed environment or have latency or bandwidth restrictions, you may want to set up regional or divisional management sites for increased network efficiency. If your organization’s internal policies require more localized operational control over parts of the network and who has access you may want to set up separate management sites to provide this control. If you have a large number of jobs and expect to generate a large number of events or collect a great deal of data, you may want to have multiple management sites to balance the load or improve performance. Smaller-scale management sites may offer you increased efficiency and performance, but multiple sites will also increase the complexity of handling system and site administration. If you are monitoring a combination of Windows computers and UNIX computers, you may want to devote separate management servers or separate management sites for the Windows agents and the UNIX agents. Managing the platforms separately through dedicated

   

- 13 -

management servers and/or management sites may give you more flexibility in defining policies and tuning performance. In general, if your organization has a large, widely distributed network, you may want to have different management sites focused on particular groups. In a smaller organization, you may prefer to centralize management in one site. Planning for Multiple Management Servers For most organizations, if you monitor 300 to 600 managed clients with a typical number of jobs (approximately 15 to 20 jobs running at regular intervals), you can use a single repository and one primary management server with a second management server as a backup. If you monitor more than 600 managed clients, you should use an additional primary and backup management server pair. If you monitor more than 1000 to 1200 managed clients, you should implement a second management site with a second dedicated repository. A good rule of thumb is to have one management server per 300 monitored devices. For practical purposes, you should not attempt to monitor more than 1200 managed clients with a single repository or use more than 4 management server pairs per repository. If you plan to use more than one management server, be sure all of the agents you install have been assigned to the first management server, before you install the second management server. After installing the second management server, you can adjust the monitoring responsibility, if needed. If you are using the second management server as a passive backup, do not use it as the primary management server for any managed clients. If you are distributing the monitoring load between two or more management server, take failover responsibility into account and ensure that no single management server will become overloaded if a primary management server fails. For more information about managing primary and secondary management servers, see the AppManager Administrator Guide. Deciding where to install the management server , repository and Control Center As part of planning your management site or sites, you need to determine whether to install the AppManager repository and management server on the same computer or on separate computers. In deciding where to locate the management server and repository, you should consider several factors, such as:  Who is responsible for the servers you are going to be managing in a given site? How many groups need to manage the servers and what level of authority or autonomy is needed for each group?  What is the geographical distribution of the servers and management groups?  What is the network bandwidth, latency, and normal load between the computer you want to use as the management server and the servers to be managed?  What is the network bandwidth, latency, and normal load between the computer you want to use as the management server and the associated repository server? Although managed clients need to communicate with the management server to send events and data, the most concentrated network traffic that the management server generates is between the management server and the repository database. In sites supporting 200 or fewer servers, NetIQ recommends locating the repository and management server on the same computer for maximum efficiency and easier maintenance. For sites monitoring more than 200 servers, NetIQ recommends installing the repository and management server on separate computers if the computers are on a 100 Mbps or faster backbone on the same IP subnet. In most cases, even organizations with 200 to 250 servers can use a single computer for both the repository and management server (the preferred configuration), unless the computer is overloaded or experiencing performance problems. If you monitor more than 250 servers, you may first want to consider creating multiple management sites with the repository and server installed on the same computer in each site. This model suits many organizations because it mirrors departmental or geographical divisions already in place and allows for distributed management by groups that may already be functionally separate. In other cases, however, it may make sense to install the repository and

- 14 -

management server on separate computers even if you are monitoring fewer servers. For example, you may want to install the repository on a computer that is managed through your database administration team to take advantage of the database expertise but choose to install the management server on a computer that is maintained by IT personnel. In planning where to install components it is just as important to consider your organization’s structure and policies as any physical system attributes or network requirements. Planning for Additional Components While AppManager is the core of the NetIQ service management solution, additional products exist that can enhance the monitoring and reporting activities, NetIQ’s advanced reporting solution Analysis Center 2.7, and AppManager Performance Profiler. The inclusion of any one or more of these additional components may impact the overall design of the solution.

Overview of AppManager components (Logical)

- 15 -

Activity 3: AppManager Physical Backend Infrastructure Each AppManager component and managed object has specific system requirements, such as memory, disk space, or supported software versions. In planning the system resources required to suit your specific environment and monitoring needs, you should keep in mind the following guidelines: Component Repository System resource requirements can be impacted by:     Number of computers, jobs, events, and data streams in your environment Network bandwidth and latency and where other components are installed. Historical reporting requirements. Physical configuration of SQL server. NetIQ reccomends following Microsoft Best Practices for configuring SQL server ( http://technet.microsoft.com/en-us/sqlserver/bb331794.aspx) Number of computers you are monitoring. Number and frequency of events in your environment. Number and frequency of data points collected. Network bandwidth and latency. Number of repositories integrated into CC Number of Service Map views created Agent deployment services Number of consoles open The preferences and options you have set (for example, the Views, panes, and tabs you decide to display). Number of computers and details displayed in the TreeView. Number of jobs, events, data streams, and active, real-time graphs you elect to display. Number of jobs running on the computer. Number of server applications you are monitoring. Interval at which the jobs run. Types of jobs you run (some jobs perform multiple or more complex tasks than others, and so use more resources)

Management server

   

Control Center

   

Operator Console

  

Agent

   

- 16 -

Sizing the AppManager repository There are many factors that influence the configuration of the database, but two of the most important factors are the number of events you expect and the number of data points you intend to collect and save for historical reporting or trend analysis. Because this information is often difficult to estimate before you install and is subject to change over time as you expand and refine your deployment strategy, NetIQ recommends the following as a starting point:   Count the number of servers you plan to monitor and multiply that number by 2 MB to account for the events and data each will generate. Multiply the result by the number of days you intend to keep data in the repository. For example, assume you have 180 servers that you intend to monitor and that you want to retain data in the repository for 30 days to generate monthly reports. From this, you can estimate that your database is likely to need 11 GB for full deployment and data collection. (Number of Agents X Number of days to collect data) X 2 MB = Estimated Repository Size  Set the initial database size during installation to be the estimated size + 20%, the initial log device to 30% of the estimated database size, and keep the data and the log on separate devices. Sizing the initial database along these guidelines is a good starting point in most environments

Example Physical Configuration The diagrams discussed in this document are intended as examples only, as there is no “one size fits all”. Use these diagrams as starting points for discussions. It is likely that hardware requirements will drive architecture choice. For example, there may be limited hardware meaning multiple management servers and a dedicated web/report agent server may not be feasible.

- 17 -

Diagram 1: Small Environment (<150 Managed Servers).  Single Server for Repository and Management

- 18 -

Diagram 2: Mid Environment (200 to600 Managed Servers)  Dedicated Repository Server  Dedicated, multiple management servers  Dedicated Web management Server

- 19 -

Diagram 3: Mid - Large Environment (600+ monitored systems)  Multiple Management Sites

- 20 -

Activity 4: AppManager Security Model Because the AppManager repository is a SQL Server database, AppManager security is fundamentally based on SQL Server security. Every user who needs access to AppManager must have a valid SQL Server login name and password for the SQL Server where the AppManager repository database is running. How those SQL Server login accounts are created and authenticated at connection time depends on the SQL Server security mode you use. Therefore, before creating any AppManager users, you should determine the SQL Server security mode you are using. SQL Server can be configured to use:  Windows Authentication Only security which links SQL Server login accounts with Windows user accounts and uses Windows account authentication to validate SQL Server logins for all connections.  Data Encryption o RPC encryption is available for windows agents o 128 bit RC4 encryption with SHA hash algorithm for Unix agents Managing users with Windows groups In addition to understanding SQL Server security modes, you should also consider using Windows groups to manage user accounts most effectively. You can create groups using your standard Windows administrative tools, and then map an entire group to a single SQL Server login. Once you have created the SQL Server login for the group, all privileges assigned to that login through SQL Server and AppManager apply to all of the member user accounts within that Windows group. Once you grant the SQL Server login account permission to access the AppManager repository, you can use the Security Manager to add the group-account as a new AppManager user. Although it’s common for a user to belong to more than one Windows group, you should avoid this when using Windows groups for AppManager users. If a user belongs to more than one Windows group that has been mapped to a SQL Server login account and added to AppManager, maintaining security can become problematic. For example, if a user belongs to two Windows groups that have been given different privileges or assigned different AppManager roles, the user may have unexpected or conflicting rights or restrictions. The best way to ensure consistency and manageability is to create new Windows groups specifically for each AppManager role you plan to define. Using the Security Manager, you can specify the individual functional rights for viewing information and performing tasks you want available for each role. For example, if there are two AppManager roles available, (Read-Only User and Sr. Admin) you can create two corresponding Windows groups called AppManager Read-Only and AppManager Sr. Admin and set the functional rights for each group of users differently. Note: When creating Windows user accounts and groups to access AppManager, you need to consider that specific privileges may be required to perform certain tasks. For example, any Windows user account or group that is used to log on to the Operator Console must be granted Write permission for the NetIQ\AppManager\bin\cache folder. Understanding AppManager Roles Using Security Manager, you identify the SQL Server users that can log on to each AppManager repository. AppManager roles enable you to define what different groups of users can see and do within AppManager consoles. For example, you may want to prevent some users from starting and stopping jobs, closing events, or changing job properties. For each AppManager role, you define the specific rights you want the users in that role to have. This collection of rights associated with an AppManager role is called a security profile. Each time you add a new user, you select the appropriate AppManager role for that user to can perform. The rights you can set for AppManager roles include:

- 21 -

   

Access to basic AppManager functions, such as whether users can run Knowledge Script jobs, acknowledge and close events, or modify the TreeView. Permission to start AppManager console programs such as the Chart Console, Repository Browser, and Distributed Event Console. Access to the different views in the Operator Console and Operator Web Console. Access to advanced AppManager operations, such as Knowledge Script property propagation, the ability to modify monitoring policies, or permission to put a computer in maintenance mode.

Security Manager includes three predefined roles that you can modify to suit your needs. You can also create your own custom roles. In general, you should use roles to strictly restrict access to many AppManager features and capabilities. Initially, you should allow only site administrators or expert-level administrators to perform most tasks and you should limit access to AppManager to a small number of people until you have firmly established site policies and role definitions that suit your organization. Once your production environment is stable and your threshold settings, job properties, event handling, and data handling policies have been sufficiently refined to meet your organization’s needs, you may want to grant more operators and administrators access to AppManager. Activity 5: AppManager Integration with Other Tools AppManager offers outstanding flexibility with regard to the integration of other toolsets. Out-of–the-box connectors are available for most major systems management “Framework” type tools such as Tivoli Enterprise Console, Micromuse NetCool, HP Openview, etc. In addition, AppManager has built-in connectors for industry leading service desk applications such as Remedy. Other software vendors offer AppManager connectors for their tools. Two leading examples of this are Managed Objects Formula and SMARTS. In all cases, please check with your vendors for minimum application requirements. Activity 6: Monitoring Strategy Development Planning your deployment has one additional key element: deciding on your core management tasks. Most likely you will refine your management policies throughout the deployment cycle, so any strategy you define in the planning phase should be considered a work-in-progress, but planning ahead can help you decide which components you need to install on which computers. Identifying management goals Different organizations have different goals and those goals influence which Knowledge Scripts they run, their event notification policies, and their data collection requirements. As a starting point, you should identify your primary and secondary objectives. For example, are you most concerned with:  Immediate event notification when something goes wrong.  Proactive management that allows you to see and respond to potential problems before something goes wrong.  Automating event notification, acknowledgment, and response.  Ongoing analysis and capacity planning.  Service Management Monitoring - Service Availability - Service Performance It is usually a good idea to define monitoring based on high level requirements such as availability monitoring, fault monitoring, performance monitoring and capacity monitoring. By grouping monitoring into categories like this, you ensure that a specific requirement is being met. It also allows you to stage the deployment of monitoring. You may roll out availability and fault monitoring as part of stage one, and then when you are comfortable, rollout performance and capacity monitors.

- 22 -

Define Operating System\Hardware Monitoring A good way to begin developing your management strategy is to monitor what is most important to your organization. A simple way to approach this is to evaluate three types of monitoring: Availability and Fault Monitoring – This is monitoring for system availability and errors. Typical monitors in this category include System up/down, service and process up/down, hardware health, and event log monitoring. Alerts are usually generated for these types of monitors. Performance Monitoring – Performance metrics are collected to provide information for performance reporting however, some performance counters can server as “early warning” signs of system failure. CPU utilization, Disk I/O, and Memory Utilization are typical counters to look at. In some cases, alerts can be setup for excessive utilization. Capacity Monitoring – Many of the same metrics that are collected with Performance monitoring are utilized for Capacity monitoring. create a spreadsheet that describes the Knowledge Scripts, intervals, and thresholds you intend to start with. You can find recommendations for which Knowledge Scripts to run from NetIQ Support if you are a registered AppManager customer for the AppManager support web site. An additional list of recommended knowledge scripts is located on the installation CD in the \appmanager\extras\best_practices directory. Define Application Monitoring As part of your planning process, you should already have identified the applications you want to monitor and the characteristics of your environments. For most applications, no additional preparation is needed. Some applications, however, do have special requirements or configuration issues. For example, monitoring Exchange requires a user account, profile, and mailbox. SQL or Oracle database monitoring requires a user account in the database as well. Because these requirements are strictly applicationspecific and not part of the typical installation process, detailed information is not provided in this guide. Instead, application-specific configuration notes are provided separately in the appmanager\documentation\configuration_notes folder. Before you install, you should check this folder for any application-specific details you may need to consider or steps you may need to perform before or after you install the AppManager agents, such as creating new accounts, modifying user rights, or locating a computer for proxy management. Define Custom Monitoring Requirements Although not often required, many organizations find it useful to identify any custom Knowledge Script requirements during the planning phase. At a minimum, you should consider whether you intend to do custom Knowledge Script development to determine whether or not to install Developers Console Utilities and where these programs should be installed. Activity 7: Create Standard Monitoring Policy Documentation It is important to clearly document the standard monitoring policies that you deploy in an easy to read format. This will make it easier to advertise your standard monitoring offerings to your customers. A Standard Monitoring Policy template is included in Appendix A. Activity 8: Reporting Strategy Development NetIQ AppManager offers a very robust and feature rich reporting infrastructure. It is important to plan your reporting as part of the initial deployment so that any additional data that you need to report against can be included in standard monitoring policies. Reporting options available in the base AppManager product include Base AppManager Reporting as well as the Reporting Extras package which is available from the AppManager web site. For more advanced reporting needs, Analysis Center should be considered.

- 23 -

Activity 9: Create Deployment Guide Outlining Architecture Installation Steps Many customers choose to create their own installation guides specific to their environment. This can be done for any number of reasons including the need to properly document the steps taken in support of regulatory (FDA, FCC, SEC, etc…) or business (SOX, etc…) requirements. These documents may include screen shots taken from the installation being done in a test lab. In many cases, referring to the NetIQ AppManager Installation guide may be sufficient. From a high level, AppManager components should be installed in the following order:  AppManager Repository  Primary Management Server  Secondary management Server (if necessary)  Operator Console  Control Center  Web Management Server  Agents Activity 10: Define Agent Deployment Tasks including scope of pilot. Depending on the size of your environment, you may want to perform the agent rollout in phases. It is recommended that you start your deployment with a small pilot group of servers to verify functionality before deploying to the entire enterprise. Once the functionality and settings are verified, you may elect to deploy to mission critical servers first, followed by production servers and finally, any other servers in your environment Activity 11: Update project plan based on design with revised resources, activities\timelines Chances are, you may need to update the activities and timelines in your project plan based on the final design. If you are ordering and installing new hardware, you may need to account for order processing and shipping times as well as the time it takes to configure the server Hardware, Operating Systems and supporting software.

DELIVERABLES:
  Updated Project Plan Solution Design Specification - AppManager Logical Design - AppManager Physical Design - Monitoring Policy Worksheet - AppManager Security Design Change Request Documentation - New Server Builds - SQL Database Security - Security Requests  Firewall Changes  AD Account Creation

- 24 -

T HE D E P L OY P H AS E

The implementation phase focuses on the buildup and phased rollout of the AppManager solution.

INPUTS:
    Solution Design Specification Change Requests Implementation Guide NetIQ AppManager Installation Guide

ACTIVITIES:
Activity 1: Build Solution Architecture It is vitally important that the servers which will house the AppManager infrastructure be working properly. As with any application, a failure at the hardware or operating system level will result in application failures.  Server hardware Installation and Configuration  Server Operating System Installation and Configuration  SQL Server Installation and Configuration  IIS\Additional Software Installation and Configuration - Anti-Virus - Patch Management, Etc… Activity 2: Install AppManager Software For detailed installation instructions for each component, please refer to the AppManager Installation guide which is located in the documents directory on the installation CD.  AppManager Repository  AppManager Management Server(s)  AppManager Operator Console  Control Center  AppManager Web Console  AppManager Report Agent(s) Activity 3: Pilot Deployment of Agents It is recommended that you first deploy monitoring to a small pilot group of servers. These servers should include a cross section of all server types that will have monitoring deployed to them. The pilot rollout will help ensure that the configurations of both the agent and monitoring are reliable.  Subset of production Deployment (10%) Activity 4: Deployment of Monitoring Policies Monitoring policies should be deployed one at a time to ensure that they are providing the required monitoring. Allow yourself time to evaluate the results of each individual monitor. In general, it is best to start with OS and Hardware monitoring deployments as these tend to be the most consistently deployed.  OS\HW  Database  Exchange\E-Mail

- 25 -

 Applications Additional monitoring not included in base Monitoring Policies should be deployed at this time as well. Examples of monitors that may not be included in monitoring policies include application specific monitors as well as monitors that shouldn’t be deployed across all systems. For example, as part of your base OS monitoring policy, you may have thresholds for disk space monitoring set to 80%. A critical application in your environment requires that the administrators be notified when a disk reaches 75% capacity. You would deploy and additional disk space monitoring script with the threshold set to 75% for the server hosting this application. Activity 5: Deployment of Baseline Reporting After you have collected about a week’s worth of data, you should begin publishing your reports. Setting up and publishing reports without sufficient data is not advisable.  Daily\Weekly\Monthly Availability Management  Daily\Weekly Monthly Capacity Management  Daily\Weekly\Monthly Performance Management  Daily\Weekly\Monthly Application Specific Activity 6: Configure and Deploy Actions Once the monitoring has been deployed and tuned, you should enable actions. Actions should be configured to only be sent when an actual alert condition exists. Examples of actions include:  Critical Pager Alert  Major Email Alert  Restarting a stopped service  Executing a script from the command line to delete temporary files Activity 7: Integration with 3rd party tools  Install Out of the box Connectors  Configure Event forwarding  Configure bi-directional communications (if applicable) Activity 8: Evaluate Pilot Results After the pilot period, you should evaluate the quality and quantity of the data and\or events that you are receiving. Have all usability, functional and technical requirements been met? Do you need to add additional monitors for certain applications? Are the generated reports easy to understand? If changes need to be made, document these required changes and work on them one at a time. Activity 9: Fine Tuning Some fine tuning of monitoring or deployment parameters is usually required after the pilot deployment. Based on your evaluation of the pilot results, you may want or need to make changes in the following areas:  Monitoring o Thresholds for CPU and Memory utilization  Raise or lower based on your results o Condition present\not present  If you have implemented some self healing features (restarting a stopped service for example), you may want to lower the alert severity for these jobs if a service is successfully restarted.  Alerting

- 26 -

You may want to leverage the advanced job parameters to reduce the sensitivity or frequency of alerts. For example, you can configure jobs to alert only if an event condition happens for two consecutive job iterations (sensitivity). You may also want to configure jobs to not send alerts for a period of time after the initial alert if the condition is still present. Reporting o Verify that reports are being delivered to the appropriate people o Ensure that the report directory structure is easy to navigate o

Activity 10: Develop Operational Documentation  Run Book of daily administrative tasks  Application Backup\Recovery Procedures  Upgrade Procedures  Customer Facing Service Offering Activity 11: Update Solution Design Document Activity 12: Production Deployment of Agents Once all changes have been made and production documentation is in place, you should continue your deployment to the rest of the devices to be monitored. Strategies vary greatly in this area and will be dependant upon the configuration and change management processes in your organization.

DELIVERABLES:
  Updated Project Plan Updated Solution Design Specification - AppManager Logical Design - AppManager Physical Design - Monitoring Policy Worksheet - AppManager Security Design

- 27 -

T HE M AN AG E P H AS E
The Manage phase focuses on the systematic review and operation of the deployed AppManager Solution. It will provide a vehicle for the system owner to verify customer satisfaction, review performance and implement required changes and enhancements.

INPUTS:
  Solution Design Specification Requirements Document

ACTIVITIES:
 Requirements Review o Verify that the solution is still meeting all requirements o Have any requirements changed? o Is addition Functional Review of AppManager solution o How is the solution being used? o Is monitoring deployed consistently? Technical Review of AppManager Solution o Performance  Reporting  Alerting  Usability o Availability  Uptime o Scalability  Is currently deployed monitoring within the limits of the architecture Usability review of AppManager Solution o Are customers utilizing the reports and data? o Are additional reports or alerts required? Document Required Changes o All required changes should be documented o Issue Application or Infrastructure change requests Implement Changes

 

 

DELIVERABLES:
 Recommended Changes Document

- 28 -

A P P E N D I X A – S AM P L E D O C U M E N T S REQUIREMENTS DEFINITION DOCUMENT Requirement Number 1 Requirement Type (U, F, T) F Description Rationale

System must be able to generate alerts for performance problems

Performance problems often lead to availability problems. It is critical that support personnel be notified early to evaluate if corrective action is required

INVENTORY DOCUMENT Location Inventory Name HQ – New York # Employees 2800 # Servers 685 Major Applications in Use All Notes

Business Application Inventory Application Internal or name external Facing SAP Internal

Primary locations served All

Business and technical Owners Bob Smith

Notes Will be upgraded in Q3

IT Inventory System Name See Server Database

OS and Version

Location

Role

Notes

Current Service, Systems, Network or Enterprise Management Solutions Application name Version Location Business and technical owners Cisco Work 2.0 HQ Jerry Grant

Notes Config mgmt

- 29 -

MANAGEMENT STRATEGY & VISION Global Enterprise Management Services Strategic Plan – FY 2005 through 2007 Building a Stronger <Company Name> Through Effective Infrastructure Management The Enterprise Management Services group has begun to institute an enterprise-wide IT resource management system to support the current and future global infrastructure. When fully implemented, the solution will provide integrated remote monitoring and management of all IT hardware, software, and networking components, as well as business critical applications and will be integrated into the helpdesk to facilitate end-user support. The strategic vision for EMS is based on the integration of solutions currently in use within the company. This capability will provide the following services: Real-time monitoring and trouble-shooting of all infrastructure components; Real-Time monitoring and troubleshooting of business critical applications; Configuration management, and performance tracking of equipment, software, and facilities; Integrated with help desk to support end users; Capacity and performance planning and management; Pre-deployment base lining of business critical applications; Management reporting on IT performance and availability; and Management reporting on Enterprise Application performance and availability. The current effort entails both short-term and more strategic initiatives. Under the short-term effort, EMS will integrate existing automated tools and databases to provide an interim capability to support urgent requirements such as Server Consolidation and the integration of the helpdesk and change management systems. Through this strategic objective, EMS will expand on this interim capability and deploy the solution company-wide over the next two years. The solution will be fully integrated and supported by central tools and databases that provide real-time information to all levels of IT support staff and management. The scope of the solution will expand to cover Domestic as well as Global infrastructure applications, databases, systems, and end-user devices. Help desk integration and consolidation will continue, with many redundant management tools eliminated and others tightly integrated into the central support infrastructure. The benefits to be derived from EMS are: Cost avoidance – reducing the rapid growth in numbers of systems deployed and need for highly skilled and expensive staff resources to support the infrastructure. Infrastructure reliability – networks and systems will perform reliably and IT resources will be available as expected. Security – careful monitoring and support will aid in detection of intruders and any other anomalies that may signal security breaches. Customer support – ready availability of accurate, timely information and automated tools will enable IT support staff to provide the best possible support to end-users. Infrastructure evolution – effective management and monitoring will aid management in making effective and timely decisions regarding acquisition, replacement and upgrading of equipment, software, services, and facilities. EMS will pursue the following strategies in implementing the strategic systems management solution: Build on current efforts – EMS will use the interim network and system management efforts currently underway to gain experience with enterprise-wide IT management, and will then be well positioned to expand these efforts for the longer-term.

- 30 -

Exploit “best of breed” technologies and approaches – EMS has been and will continue to research industry IT best practices in enterprise systems management, and will seek out effective partnerships to capitalize on industry success. Integrate and coordinate – to deploy an effective enterprise systems management solution in the global environment EMS will require high levels of coordination among support teams and with international groups. EMS will integrate multiple technologies and approaches in a modular fashion to achieve flexibility and allow for incorporation of technological advances over time. Standard operating environment (SOE) – the solution will include a common set of configuration standards with regards to systems management for all networks, server and other IT assets – thus constituting a SOE. The SOE will increase system reliability and manageability, while contributing to favorable TCO by reducing number of monitoring software standards deployed. Measures Performance measures for this objective are identified below during the first year in which they are measured

Major Milestones

FY2005 Identification of “Best of Breed” tools and processes Common operating environment (COE) documented and integrated into deployment and support processes Deployment to in-place production class devices Output Growth in use of EMS services Extent to which problems are identified through EMS rather than in response to user complaints Network and system availability and reliability Outcome Average and range of times to respond to users and to correct problems will decrease Output Reduction in number of separate management consoles Reduction in number of network management centers Outcome Average support cost per end user decreased Management and end user perception of the quality of support services increases FY2007 EMS and SOE deployed Company-wide Outcome Reduction in lost work hours resulting from unavailability of systems

FY2006 Final solution deployed at 100% of Domestic sites and key International sites. Consolidated management dashboard for infrastructure and applications

EXTERNAL FACTORS
Other than the availability of funding and coordination between support teams, there are no external factors that will have a substantial impact on the achievement of this objective.

- 31 -

STANDARD MONITORING POLICY Purpose Define the enterprise Management Services monitors that will be run on Microsoft Windows 2000 Servers. Scope This policy applies to all servers running the Windows 2000 Server, Windows 2000 Enterprise Server and Windows 2000 Data Center Server operating systems. Applicable Documents Document Number If Applicable Responsibilities    The “Server Engineering\Operation” team is responsible for ensuring that the monitors described in this document meet current monitoring requirements. The Enterprise Management Services group is responsible for the accuracy and implementation of this procedure. The Enterprise Management Services group is responsible for the maintenance of this procedure. Document Title System Performance Monitoring

Standard Monitors CPU SUBSYSTEM METRICS Perfmon Counter Processor\ % Processor Time Alert/ Data Both Description % Processor Time is the percentage of time that the processor is executing a non-Idle thread. This counter was designed as a primary indicator of processor activity Interrupts/sec is the average number of hardware interrupts the processor is receiving and servicing in each second. It does not include DPCs, which are counted separately. This value is an indirect indicator of the activity of devices that generate interrupts Processor Queue Length is the number of threads in the processor queue. There is a single queue for processor time even on computers with multiple processors. A sustained processor queue of greater than two threads generally indicates processor congestion. Polling Period 10 min. Threshold 90%

Processor\ Interrupts/sec

Both

10 min.

600

System\ Processor Queue

Both

10 min.

2

- 32 -

MEMORY SUBSYSTEM MONITORS Perfmon Counter Memory\ Available Bytes Alert/ Data Both Description Available Bytes is the amount of physical memory available to processes running on the computer Committed Bytes is the amount of committed virtual memory, in bytes. (Committed memory is physical memory for which space has been reserved on the disk paging file in case it needs to be written back to disk) Pages/sec is the number of pages read from or written to disk to resolve hard page faults. This counter was designed as a primary indicator of the kinds of faults that cause system-wide delays. Page Reads/sec is the number of times the disk was read to resolve hard page faults. The amount of the Page File instance in use in percent Polling Period 5 min. Threshold 90%

Memory\ Committed Bytes

Both

5 min.

90%

Memory\ Pages/sec

Alert

10 min.

750

Memory\ Page Reads/sec

Both

10 min.

200

Paging File\ % Usage

Both

10 min.

90%

DISK SUBSYSTEM MONITORS Perfmon Counter Physical Disk\ % Disk Time Alert/ Data Both Description % Disk Time is the percentage of elapsed time that the selected disk drive is busy servicing read or write requests. Avg. Disk Queue Length is the average number of both read and write requests that were queued for the selected disk during the sample interval. Avg. Disk sec/Transfer is the time in seconds of the average disk transfer % Free Space is the ratio of the free space available on the logical disk unit to the total usable space provided by the selected logical disk drive. Pollng Period 10 min. Threshold 100 ms

Physical Disk\ Avg. Disk Queue Length

Both

10 min.

1

Physical Disk\ Avg. Disk sec/Transfer Logical Disk / % Free Space

Both

30 min.

80

Both

4 hours

5% or 100 MB

- 33 -

NETWORK SUBSYSTEM MONITORS Perfmon Counter Network Interface / Bytes Total\Sec Alert/ Data Both Description Output Queue Length is the length of the output packet queue (in packets). If this is longer than 2, delays are being experienced and the bottleneck should be found and eliminated if possible. Polling Period 5 Min Threshold 35% of Total bandwidth

MISC. MONITORS Perfmon Counter Dr. Watson Instances Failed Login Attempts MSI Installs Shared Files System Uptime Automatic Services Alert/ Data Alert Alert\Data Alert Data Both Alert Description Number of Dr. Watson errors Number of Failed login attempts Notification of software installations Number of files open by user connections Amount of time system has been running Verifies all automatic services are running Pollng Period 15 Min. 1 Hour 1 Hour 20 Min. 10 Min 10 Min Threshold 1 50 1 N/A <10 Min. Stopped

REVISION HISTORY Effective Date By

Description Document Creation

Issue 1.0

- 34 -

ADDITIONAL READING APPMANAGER SUITE WHITEPAPERS Information Technology Information Library (ITIL) ICT Infrastructure Management Covers network service management, operations management, management of local processors, computer installation and acceptance and systems management. Covers the software development life cycle and provides details on business change with the emphasis on clear requirement definitions and implementation to meet business users' needs. Illuminates the links and the principal relationships between all the Service Management and other Infrastructure Management processes. Service Delivery covers Capacity Management, Financial Management for IT Services, Availability Management, Service Level Management and IT Continuity Management. Promoting a quality approach to IT Service Management. "Service Support" covers Service Desk, Incident Management, Problem Management, Configuration Management, Change Management and Release Management.

Application Management

Service Delivery

Service Support

- 35 -

SAMPLE PROJECT PLAN

- 36 -