EGEE-II

ASSESSMENT
OF PRODUCTION GRID INFRASTRUCTURE SERVICE STATUS
EU DELIVERABLE: DSA1.7
Document identifier: Date: Activity: Lead Partner: Document status:
EGEE-II-DSA1.7-v3-0.doc

28/03/2008 SA1: Grid Operations, Support and Management SARA Final Draft

Document link:

https://edms.cern.ch/document/726263

Abstract: This document contains an assessment of the production Grid service at project month 18.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

1 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Copyright notice: Copyright © Members of the EGEE-II Collaboration, 2006. See www.eu-egee.org for details on the copyright holders. EGEE-II (“Enabling Grids for E-sciencE-II”) is a project co-funded by the European Commission as an Integrated Infrastructure Initiative within the 6th Framework Programme. EGEE-II began in April 2006 and will run for 2 years. For more information on EGEE-II, its partners and contributors please see www.eu-egee.org You are permitted to copy and distribute, for non-profit purposes, verbatim copies of this document containing this copyright notice. This includes the right to copy this document in whole or in part, but without modification, into other documents if you attach the following reference to the copied elements: “Copyright © Members of the EGEE-II Collaboration 2006. See www.eu-egee.org for details”. Using this document in a way and/or for purposes not foreseen in the paragraph above, requires the prior written permission of the copyright holders. The information contained in this document represents the views of the copyright holders as of the date such views are published.
THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED BY THE COPYRIGHT HOLDERS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE MEMBERS OF THE EGEE-II COLLABORATION, INCLUDING THE COPYRIGHT HOLDERS, OR THE EUROPEAN COMMISSION BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THE INFORMATION CONTAINED IN THIS DOCUMENT, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Trademarks: EGEE and gLite are registered trademarks held by CERN on behalf of the EGEE collaboration. All rights reserved"

Delivery Slip
Name From Jules Wolfrat Ian Bird
Geneviève Romier Samuel Skipsey

Partner/Activity SARA/SA1 CERN/ SA1
CNRS/SA2 NA3 INFN/SA1 TAU/NA3 JRA2

Date

Signature

Reviewed by Cristina Vistoli
David Horn Etienne Urbah

Approved by Document Log Issue Date Comment
Reviewers comments added Replaced section 3.7.4.1 (JSPG) by update from Dave Kelsey Added information requested by Bob Jones Replaced Table 6 by two new ones (Tables 6 and 7) 2.0 1.4 1.1 1.0 29-02-2008 27-02-2008 18-02-2008 13-02-2008 Draft ready for review Changes from reviews by Ian Bird and Alistair Mills and feedback from ROC managers Editorial revision Draft for comment Alistair Mills/CERN Jules Wolfrat/SARA Jules Wolfrat/SARA Coen Schrijvers/SARA Alistair Mills/CERN Jules Wolfrat/SARA Coen Schrijvers/SARA Alistair Mills/CERN

Author/Partner

3.0

28-03-2008

Jules Wolfrat/SARA Coen Schrijvers/SARA

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

2 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Document Change Record Issue Item Reason for Change

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

3 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

TABLE OF CONTENTS
E G E E - I I ....................................................................................................................................................... 1 EU DELIVERABLE: DSA1.7 ............................................................................................................................. 1 COPYRIGHT NOTICE: ...................................................................................................................................... 2 FOR MORE INFORMATION ON EGEE-II, ITS PARTNERS AND CONTRIBUTORS PLEASE SEE WWW.EU-EGEE.ORG........................................................................................................................................ 2 DELIVERY SLIP................................................................................................................................................... 2 TABLE OF CONTENTS...................................................................................................................................... 4 TABLE OF FIGURES.......................................................................................................................................... 9 1. INTRODUCTION........................................................................................................................................... 11 1.1. PURPOSE ..................................................................................................................................................... 11 1.2. DOCUMENT ORGANISATION ........................................................................................................................ 11 1.3. APPLICATION AREA .................................................................................................................................... 11 1.4. REFERENCES ............................................................................................................................................... 11 [R 1] EGEE-II DESCRIPTION OF WORK HTTPS://EDMS.CERN.CH/DOCUMENT/684101......... 11

[R 2] DSA1.4: ASSESSMENT OF PRODUCTION SERVICE STATUS HTTPS://EDMS.CERN.CH/DOCUMENT/726140.......................................................................................... 11 [R 3] EGEE-DSA1.8-OPERATIONAL-ASSESSMENT-2 HTTPS://EDMS.CERN.CH/DOCUMENT/489464............................................................................................ 11 [R 4] MSA1.1: OPERATIONS METRICS DEFINED HTTPS://EDMS.CERN.CH/DOCUMENT/723928............................................................................................ 11 [R 5] METRICS IMPLEMENTATION GROUP HTTP://EGEE-DOCS.WEB.CERN.CH/EGEEDOCS/LIST.PHP?DIR=./MIG/PRODUCTION/&.......................................................................................... 11 [R 6] SITE REGISTRATION POLICY & PROCEDURE HTTPS://EDMS.CERN.CH/DOCUMENT/503198.......................................................................................... 11 [R 7] INFORMATION ABOUT TESTING A SITE HTTP://GRIDDEPLOYMENT.WEB.CERN.CH/GRID-DEPLOYMENT/DOCUMENTATION/LCG2-SITE-TESTING/ 11 [R 8] SAM ADMIN PAGE (SAMAP) HTTPS://CIC.GRIDOPS.ORG/SAMADMIN/......................... 11

[R 9] ACCOUNTING PORTAL HTTP://WWW3.EGEE.CESGA.ES/GRIDSITE/ACCOUNTING/CESGA/EGEE_VIEW.HTML ........... 11 [R 10] OAG MANAGEMENT PAGE ON CIC PORTAL HTTPS://CIC.GRIDOPS.ORG/INDEX.PHP?SECTION=OAG&PAGE=IDCARDUPDATE&SUBPAGE= 12

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

4 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

[R 11] SITE INFORMATION PER REGION HTTP://GOC.GRID.SINICA.EDU.TW/GSTAT//REGION.HTML.............................................................. 12 [R 12] HISTORICAL DATA FROM GSTAT MONITOR HTTP://GOC.GRID.SINICA.EDU.TW/GSTAT/DATA/................................................................................ 12 [R 13] [R 14] GOCDB HTTPS://GOC.GRIDOPS.ORG/..................................................................................... 12 SAM HTTPS://LCG-SAM.CERN.CH:8443/SAM/SAM.PY ....................................................... 12

[R 15] COD OPERATIONAL PROCEDURES HTTPS://TWIKI.CERN.CH/TWIKI/BIN/VIEW/EGEE/EGEEROPERATIONALPROCEDURES........ 12 [R 16] CIC PORTAL HTTPS://CIC.GRIDOPS.ORG/ ............................................................................ 12

[R 17] TCG MANDATE HTTP://EGEE-INTRANET.WEB.CERN.CH/EGEEINTRANET/NA1/TCG/TCG.HTM................................................................................................................... 12 [R 18] GGUS SUPPORT CONTACTS HTTPS://GUS.FZK.DE/PAGES/RESP_UNIT_INFO.PHP .. 12

[R 19] MSA1.8 ASSESSMENT OF GGUS SUPPORT HTTPS://EDMS.CERN.CH/DOCUMENT/726136.......................................................................................... 12 [R 20] GOC ACCOUNTING SERVICES HTTP://GOC.GRIDSUPPORT.AC.UK/GRIDSITE/ACCOUNTING/INDEX.HTML .................................................................. 12 [R 21] EGEE-II MONTHLY SUPPORT USAGE HTTP://EGEE-DOCS.WEB.CERN.CH/EGEEDOCS/LIST.PHP?DIR=./SUPPORT/USAGE/MONTHLY/& ....................................................................... 12 [R 22] LCG SERVICE CHALLENGE INFORMATION HTTPS://TWIKI.CERN.CH/TWIKI/BIN/VIEW/LCG/LCGSERVICECHALLENGES ........................... 12 [R 23] GRIDVIEW HTTP://GRIDVIEW.CERN.CH/GRIDVIEW/ ....................................................... 12

[R 24] GRIDVIEW JOB STATISTICS HTTP://GRIDVIEW.CERN.CH/GRIDVIEW/JOB_INDEX.PHP ................................................................ 12 [R 25] GRIDMAP HTTP://GRIDMAP.CERN.CH/GM/........................................................................... 12

[R 26] WLCG MONITORING SESSION HTTP://INDICO.CERN.CH/SESSIONDISPLAY.PY?SESSIONID=18&SLOTID=0&CONFID=3738#200 7-01-23. 12 [R 27] OSG PROJECT HTTP://WWW.OPENSCIENCEGRID.ORG/................................................... 12

[R 28] GIN ACTIVITY HTTP://FORGE.OGF.ORG/SF/SFMAIN/DO/VIEWPROJECT/PROJECTS.GIN;JSESSIONID=5509BC EEE848D370AC80D7AD2BDAA0A9 ............................................................................................................... 12 [R 29] GIN OGF INFORMATION HTTP://WWW.OGF.ORG/NEWS/NEWSCAL_ENEWS.PHP?DEC07#LINK4......................................... 13

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

5 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

[R 30] TOP ROC ISSUES AS INPUT FOR TCG HTTPS://TWIKI.CERN.CH/TWIKI/BIN/VIEW/EGEE/SA1_TCG............................................................. 13 [R 31] NGI WORKSHOP EGEE’06 HTTP://INDICO.CERN.CH/SESSIONDISPLAY.PY?SESSIONID=81&SLOTID=1&CONFID=1504#200 6-09-26 13 [R 32] JSPG HOME HTTP://PROJ-LCG-SECURITY.WEB.CERN.CH/PROJ-LCGSECURITY/DEFAULT.HTML......................................................................................................................... 13 [R 33] JSPG DOCUMENTS HTTP://PROJ-LCG-SECURITY.WEB.CERN.CH/PROJ-LCGSECURITY/DOCUMENTS.HTML.................................................................................................................. 13 [R 34] OSCT DISSEMINATION HTTP://OSCT.WEB.CERN.CH/OSCT/DISSEMINATION.HTML 13 OSCT HOME HTTP://OSCT.WEB.CERN.CH/OSCT/ ................................................................ 13

[R 35]

[R 36] OPERATOR ON DUTY STATISTICS HTTPS://CIC.GRIDOPS.ORG/INDEX.PHP?SECTION=ROC&PAGE=OPERATIONMETRICS&SUBP AGE=OPERATIONMETRICS_GGUS............................................................................................................ 13 [R 37] WLCG MOU HTTP://LCG.WEB.CERN.CH/LCG/C-RRB/MOU/WLCGMOU.PDF .............. 13

[R 38] MSA1.3 SITE OPERATIONS POLICY AGREEMENT HTTPS://EDMS.CERN.CH/DOCUMENT/726129.......................................................................................... 13 [R 39] SLA PRESENTATION SE FEDERATION HTTP://INDICO.CERN.CH/MATERIALDISPLAY.PY?CONTRIBID=9&SESSIONID=1& MATERIALID=SLIDES&CONFID=A063044 ..................................................................................... 13 [R 40] SECURITY COORDINATION GROUP HTTP://ZOPE.PDC.KTH.SE/SCG............................ 13

[R 41] GLITE 3.0 UPDATE HISTORY HTTP://GLITE.WEB.CERN.CH/GLITE/PACKAGES/R3.0/UPDATES.ASP ............................................ 13 [R 42] GLITE 3.1 RELEASE INFORMATION HTTP://GLITE.WEB.CERN.CH/GLITE/PACKAGES/R3.1/ ....................................................................... 13 [R 43] [R 44] GGUS – GLOBAL GRID USER SUPPORT HTTPS://GUS.FZK.DE/PAGES/HOME.PHP .... 13 SLD DOCUMENT HTTPS://EDMS.CERN.CH/DOCUMENT/860386/0.5................................. 13

[R 45] FAILOVER INFORMATION HTTP://GOC.GRID.SINICA.EDU.TW/GOCWIKI/FAILOVER_MECHANISMS.................................... 13 [R 46] OPERATIONS TOOLS MAP HTTP://GOC.GRID.SINICA.EDU.TW/GOCWIKI/FAILOVER_MECHANISMS/OPTOOLSMAP ...... 13 [R 47] MJRA2.2.1 SECURITY AUDIT - STRATEGY AND PLAN HTTPS://EDMS.CERN.CH/DOCUMENT/760826.......................................................................................... 14

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

6 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

[R 48] EGI_DS MEETING BUDAPEST 2007 HTTP://WEB.EUEGI.ORG/EVENTS/WORKSHOPS/OCT07/ .................................................................................................. 14 [R 49] EGI-DS (EUROPEAN GRID INITIATIVE DESIGN STUDY) HTTP://WEB.EU-EGI.ORG/. 14

[R 50] DSA1.6: REPORT ON ROC PROGRESS AND ISSUES EDMS: DSA1.6: REPORT ON ROC PROGRESS AND ISSUES................................................................................................................................. 14 [R 51] SPECINT 2000 INFORMATION HTTP://HEPIX.CASPUR.IT/PROCESSORS/ ..................... 14

[R 52] DASHBOARD FOR JOB INFORMATION HTTP://LXARDA16.CERN.CH/DASHBOARD/REQUEST.PY/DAILYSUMMARY ................................ 14 [R 53] ATLAS JOB INFORMATION HTTP://DASHB-ATLASJOB.CERN.CH/DASHBOARD/REQUEST.PY/ERRORLIST ...................................................................... 14 [R 54] GLITE-WMS MONITORING HTTP://EGEE-DOCS.WEB.CERN.CH/EGEEDOCS/LIST.PHP?DIR=./MONITORING/ETIENNE/FAILURES/&........................................................... 14 1.5. DOCUMENT AMENDMENT PROCEDURE ........................................................................................................ 14 1.6. TERMINOLOGY............................................................................................................................................ 14 GLOSSARY ........................................................................................................................................................ 14 2. EXECUTIVE SUMMARY AND OVERVIEW OF PRODUCTION SERVICE ...................................... 17 FIGURE 1 EVOLUTION OF SITES IN EGEE AND EVOLUTION OF CPUS AVAILABLE ................. 17 TABLE 1: PARTNER PROVIDED CPU JANUARY 2008 ............................................................................ 18 TABLE 2: PARTNERS NOT FULFILLING DOW CPU COMMITMENTS IN JANUARY 2008............ 18 TABLE 3: NUMBER OF COUNTRIES AND SITES SUPPORTED BY EACH ROC IN JANUARY 2008 .............................................................................................................................................................................. 19 FIGURE 2: JOBS RUN ON A MONTHLY BASIS. BASED ON INFORMATION FROM ACCOUNTING PORTAL [R 9] .................................................................................................................................................... 20 3. SA1 BASIC PRODUCTION SERVICES ..................................................................................................... 22 3.1. INTRODUCTION ........................................................................................................................................... 22 3.2. OVERVIEW OF PRODUCTION INFRASTRUCTURE ........................................................................................... 22 3.2.1. Sites in production.............................................................................................................................. 22 3.2.2. Compute power................................................................................................................................... 23 FIGURE 4 EXPECTED EVOLUTION OF COMPUTING RESOURCES OVER THE COURSE OF EGEE-II IN KSI2K UNITS................................................................................................................................ 24 3.2.3. Available and used storage ................................................................................................................ 27 3.2.4. Number of countries ........................................................................................................................... 28 3.2.5. Number of active Virtual Organisations ............................................................................................ 28 FIGURE 9 NUMBER OF ACTIVE VOS (METRIC SIZE.6) ........................................................................ 29 3.2.6. Resource usage................................................................................................................................... 30 3.2.7. Job statistics ....................................................................................................................................... 31

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

7 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

FIGURE 18 AVERAGE PROCESSING TIME OF JOBS. METRIC SERVICE.2. SOURCE [R 24] ....... 36 3.2.8. Number of active users ....................................................................................................................... 36 3.2.9. Data services ...................................................................................................................................... 37 FIGURE 19 DATA THROUGHPUT, METRIC USAGE.4 (MONTHLY)................................................... 37 FIGURE 20 EXAMPLE OF AVERAGE DAILY DATA THROUGHPUT, METRIC USAGE.4 .............. 37 3.3. INFRASTRUCTURE SERVICES ....................................................................................................................... 37 3.3.1. Failover facilities ............................................................................................................................... 38 3.3.2. Status of core and site services........................................................................................................... 38 FIGURE 21 AVAILABILITY OF SRM SERVICE AT TIER1 SITES. METRIC SERVICE.5 ................. 39 FIGURE 22 OVERALL FTS AVAILABILITY ON THE LEFT, EXAMPLE FOR ONE SITE ON THE RIGHT. METRIC SERVICE.9 ......................................................................................................................... 39 FIGURE 23 CE AVAILABILITY AT TIER-1/0 SITES. METRIC SERVICE.10 ....................................... 40 3.4. GRID MONITORING AND CONTROL .............................................................................................................. 40 3.4.1. Monitoring the operational state........................................................................................................ 40 FIGURE 24 GRIDMAP EXAMPLE FOR ALL REGIONS ........................................................................... 41 3.4.2. Grid Operator on Duty (COD) monitoring ........................................................................................ 41 3.4.3. Status of grid monitoring and control ................................................................................................ 42 3.5. MIDDLEWARE DEPLOYMENT AND INTRODUCING NEW RESOURCES ............................................................. 42 3.5.1. SA3 interaction ................................................................................................................................... 42 3.5.2. Technical Coordination Group .......................................................................................................... 42 3.5.3. Introduction of new resources ............................................................................................................ 43 3.5.4. Status of middleware deployment and introducing new resources..................................................... 43 3.6. RESOURCE AND USER SUPPORT ................................................................................................................... 43 3.6.1. Global Grid User Support statistics ................................................................................................... 43 FIGURE 25 NUMBER OF TICKETS, FOR DIFFERENT CATEGORIES, FOR EACH MONTH IN THE REPORTING PERIOD ...................................................................................................................................... 44 3.7. GRID MANAGEMENT ................................................................................................................................... 45 3.7.1. Service Level Agreement management ............................................................................................... 46 3.7.2. Operational Application Group ......................................................................................................... 46 3.7.3. LHC Tier-1 integration ...................................................................................................................... 46 3.7.4. Security activities ............................................................................................................................... 46 TABLE 4 LIST OF DOCUMENT PRODUCED BY JSPG ............................................................................ 47 4. SA1 SUPPORTING SERVICES.................................................................................................................... 49 4.1. INTERNATIONAL COLLABORATION.............................................................................................................. 49 4.2. CAPTURE AND PROVIDE REQUIREMENTS ..................................................................................................... 50 4.3. LONG TERM SUSTAINABILITY ..................................................................................................................... 50 5. CONCLUSIONS AND LESSONS LEARNED ............................................................................................ 51 5.1. RECOMMENDATIONS .................................................................................................................................. 51 5.1.1. Recommendations from DSA1.4 ......................................................................................................... 51 5.1.2. New recommendations ....................................................................................................................... 51 5.2. LESSONS LEARNED ..................................................................................................................................... 52 6. ANNEXES ....................................................................................................................................................... 53

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

8 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

6.1. METRICS TABLE .......................................................................................................................................... 53 6.2. STATUS TOP LEVEL BDIIS .......................................................................................................................... 53 6.3. STATUS TABLE SA1 SUB-SERVICES ............................................................................................................. 53 TABLE 5 STATUS OVERVIEW SA1 SUB-SERVICES ................................................................................ 53 6.4. NUMBER OF REGISTERED USERS PER VO .................................................................................................... 55

Table of Figures Figure 1 Evolution of Sites in EGEE and evolution of CPUs available................................................ 17 Figure 2: Jobs run on a monthly basis. Based on information from accounting portal [R 9]................ 20 Figure 3 Number of sites in the EGEE-II infrastructure, per federation from January 1 through October 1, 2007. These data relate to the size.1 metric. Information is taken from [R 12]. ............................... 23 Figure 4 Expected evolution of computing resources over the course of EGEE-II in kSI2k units ....... 24 Figure 5 Number of CPUs in the EGEE-II infrastructure per federation. These data relate to the size.1 metric. The information is from [R 12] ................................................................................................. 25 Figure 6 Comparison of expected compute resources from the EGEE-II DoW for March 2007, and the real compute resources at that date. The latter is derived from the maximum CPU numbers for March 2007 times a factor 1.5 for the conversion of CPU numbers to kSI2k.................................................. 26 Figure 7 Total storage available in the EGEE-II infrastructure in terabytes. These data relate to the size.4 metric. The information is from [R 12]....................................................................................... 27 Figure 8 Total storage used in the EGEE-II infrastructure in terabytes. These data relate to the size.4 metric. The information was taken from [R 12].................................................................................... 28 Figure 9 Number of active VOs (metric size.6) .................................................................................... 29 Figure 10 Normalized CPU time by region and project month for the first nine months of 2007. These data relate to the size.2 metric. Image taken from [R 9]. ...................................................................... 30 Figure 11 Production Normalized CPU time by VO and project month. These data relate to the size.3 metric. Image taken from [R 9]. ............................................................................................................ 31 Figure 12 Number of jobs per month for the first nine months of 2007. This relates to the usage.1 metric. Source: [R 9] ............................................................................................................................. 32 Figure 13 Number of jobs per VO for the period January – September 2007. This relates to the usage.1 metric. Source: [R 9] ............................................................................................................................. 33 Figure 14: Number of jobs per VO per month for the assessment period. This relates to the usage.1 metric. Source: [R 24] and is for jobs submitted through resource brokers on. ................................... 34 Figure 15 State-wise job distribution per month for the assessment period. This relates to the usage.3 metric. Source: [R 24] ........................................................................................................................... 35 Figure 16 Job success rate per VO for the assessment period. This relates to the usage.3 metric. Source: [R 24] ....................................................................................................................................... 35

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

9 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Figure 17 Job success rate per month for the assessment period. This relates to the usage.3 metric. Source: [R 23] ....................................................................................................................................... 36 Figure 18 Average processing time of jobs. Metric service.2. Source [R 24]....................................... 36 Figure 19 Data throughput, metric usage.4 (monthly) ......................................................................... 37 Figure 20 Example of average daily data throughput, metric usage.4 .................................................. 37 Figure 21 Availability of SRM service at tier1 sites. Metric service.5 ................................................. 39 Figure 22 Overall FTS availability on the left, example for one site on the right. Metric service.9..... 39 Figure 23 CE availability at Tier-1/0 sites. Metric service.10............................................................... 40 Figure 24 GridMap example for all regions .......................................................................................... 41 Figure 25 Number of tickets, for different categories, for each month in the reporting period ............ 44 Figure 26 Average GGUS ticket response times in hours. A distinction is made between all tickets and tickets related to VOs, COD, ENOC, middleware, and other. These data relate to the user-support.5 metric. Over the reporting period, only information from April and later is available. Source: [R 21] 45

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

10 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

1. INTRODUCTION
1.1. PURPOSE The purpose of this document is to assess the status of the service of EGEE-II halfway through the second project year on October 1, 2007. In the “Description of Work” document [R 1] the mandate for the document is formulated as to assess the status of the service using a set of metrics defined in MSA1.1 (Operations metrics defined) [R 4]. This work builds on the results from the deliverable DSA1.4, the assessment of the production service at the end of the first project year. The period for which results are taken into account is between January 1, 2007 and October 1, 2007. The date of October 1 was chosen in order to have a well-defined point in time and to avoid discussions if data should be included or not. Sometimes more recent data will be used in order to confirm observed trends. 1.2. DOCUMENT ORGANISATION The document follows the conventions of EGEE-II deliverables. 1.3. APPLICATION AREA This document is intended for members of the EGEE project and for external readers to understand the status of the activity and the areas in which improvements can be made. 1.4. REFERENCES [R 1] [R 2] [R 3] [R 4] [R 5] [R 6] [R 7] [R 8] [R 9] EGEE-II Description of Work https://edms.cern.ch/document/684101 DSA1.4: Assessment of production service status https://edms.cern.ch/document/726140 EGEE-DSA1.8-Operational-Assessment-2 https://edms.cern.ch/document/489464 MSA1.1: Operations metrics defined https://edms.cern.ch/document/723928 Metrics Implementation Group http://egee-docs.web.cern.ch/egee-docs/list.php?dir=./mig/production/& Site registration policy & procedure https://edms.cern.ch/document/503198 Information about testing a site http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Site-Testing/ SAM admin page (SAMAP) https://cic.gridops.org/samadmin/ Accounting portal http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.html

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

11 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

[R 10] OAG management page on CIC portal https://cic.gridops.org/index.php?section=oag&page=idcardupdate&subpage= [R 11] Site information per region http://goc.grid.sinica.edu.tw/gstat//Region.html [R 12] Historical data from gstat monitor http://goc.grid.sinica.edu.tw/gstat/data/ [R 13] GOCDB https://goc.gridops.org/ [R 14] SAM https://lcg-sam.cern.ch:8443/sam/sam.py [R 15] COD operational procedures https://twiki.cern.ch/twiki/bin/view/EGEE/EGEEROperationalProcedures. [R 16] CIC portal https://cic.gridops.org/ [R 17] TCG mandate http://egee-intranet.web.cern.ch/egee-intranet/NA1/TCG/tcg.htm [R 18] GGUS support contacts https://gus.fzk.de/pages/resp_unit_info.php [R 19] MSA1.8 Assessment of GGUS support https://edms.cern.ch/document/726136 [R 20] GOC accounting services http://goc.grid-support.ac.uk/gridsite/accounting/index.html [R 21] EGEE-II monthly support usage http://egee-docs.web.cern.ch/egee-docs/list.php?dir=./support/usage/monthly/& [R 22] LCG service challenge information https://twiki.cern.ch/twiki/bin/view/LCG/LCGServiceChallenges [R 23] GRIDVIEW http://gridview.cern.ch/GRIDVIEW/ [R 24] Gridview job statistics http://gridview.cern.ch/GRIDVIEW/job_index.php [R 25] GridMap http://gridmap.cern.ch/gm/ [R 26] WLCG Monitoring session http://indico.cern.ch/sessionDisplay.py?sessionId=18&slotId=0&confId=3738#2007-01-23. [R 27] OSG project http://www.opensciencegrid.org/ [R 28] GIN activity http://forge.ogf.org/sf/sfmain/do/viewProject/projects.gin;jsessionid=5509BCEEE848D370A C80D7AD2BDAA0A9

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

12 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

[R 29] GIN OGF information http://www.ogf.org/News/newscal_enews.php?dec07#LINK4 [R 30] Top ROC issues as input for TCG https://twiki.cern.ch/twiki/bin/view/EGEE/SA1_TCG [R 31] NGI workshop EGEE’06 http://indico.cern.ch/sessionDisplay.py?sessionId=81&slotId=1&confId=1504#2006-09-26 [R 32] JSPG home http://proj-lcg-security.web.cern.ch/proj-lcg-security/default.html [R 33] JSPG documents http://proj-lcg-security.web.cern.ch/proj-lcg-security/documents.html [R 34] OSCT dissemination http://osct.web.cern.ch/osct/dissemination.html [R 35] OSCT home http://osct.web.cern.ch/osct/ [R 36] Operator on duty statistics https://cic.gridops.org/index.php?section=roc&page=operationmetrics&subpage=operationme trics_ggus [R 37] WLCG MoU http://lcg.web.cern.ch/LCG/C-RRB/MoU/WLCGMoU.pdf [R 38] MSA1.3 Site operations policy agreement https://edms.cern.ch/document/726129 [R 39] SLA Presentation SE Federation http://indico.cern.ch/materialDisplay.py?contribId=9&sessionId=1&materialId=slid es&confId=a063044 [R 40] Security Coordination Group http://zope.pdc.kth.se/scg [R 41] gLite 3.0 update history http://glite.web.cern.ch/glite/packages/R3.0/updates.asp [R 42] gLite 3.1 release information http://glite.web.cern.ch/glite/packages/R3.1/ [R 43] GGUS – Global Grid User Support https://gus.fzk.de/pages/home.php [R 44] SLD document https://edms.cern.ch/document/860386/0.5 [R 45] Failover information http://goc.grid.sinica.edu.tw/gocwiki/Failover_mechanisms [R 46] Operations Tools Map http://goc.grid.sinica.edu.tw/gocwiki/Failover_mechanisms/OptoolsMap

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

13 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

[R 47] MJRA2.2.1 Security Audit - Strategy and Plan https://edms.cern.ch/document/760826 [R 48] EGI_DS meeting Budapest 2007 http://web.eu-egi.org/events/workshops/oct07/ [R 49] EGI-DS (European Grid Initiative Design Study) http://web.eu-egi.org/ [R 50] DSA1.6: Report on ROC progress and issues EDMS: DSA1.6: Report on ROC progress and issues [R 51] SpecInt 2000 information http://hepix.caspur.it/processors/ [R 52] Dashboard for job information http://lxarda16.cern.ch/dashboard/request.py/dailysummary [R 53] Atlas job information http://dashb-atlas-job.cern.ch/dashboard/request.py/ErrorList [R 54] gLite-WMS monitoring http://egee-docs.web.cern.ch/egeedocs/list.php?dir=./monitoring/etienne/failures/& 1.5. DOCUMENT AMENDMENT PROCEDURE Amendments, comments and suggestions should be sent to the authors. The procedures documented in the EGEE “Document Management Procedure” will be followed: http://egee-jra2.web.cern.ch/EGEE-JRA2/Procedures/DocManagmtProcedure/DocMngmt.htm. 1.6. TERMINOLOGY This subsection provides the definitions of terms, acronyms, and abbreviations required to properly interpret this document. A complete project glossary is provided in the EGEE glossary http://egee-jra2.web.cern.ch/EGEE-JRA2/Glossary/Glossary.html. Glossary Acronym AMGA APEL API ARM AUP BDII CE CIC CIS Meaning ARDA Metadata Grid Application Accounting Processor for Event Logs Application Programming Interface All ROC Managers’ Meeting Acceptable Use Policy Berkeley Database Information Index Compute Element Core Infrastructure Centre Core Infrastructure Services

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

14 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Acronym COD DGAS DoW EIS E2ECU ENOC ESC EUGridPMA FTS GD GIN GOC GOCDB GGUS GridICE IGTF JSPG kSI2k L&B LCG LHC MIG MoU NGI NREN OAG OCC OGF

Meaning CIC-on-duty Distributed Grid Accounting System Description of Work Experiment Integration Support End to End Coordination Unit EGEE Network Operations Centre EGEE Support Committee European Grid Policy Management Authority File Transfer Service Grid Deployment Grid Interoperability Now Grid Operations Centre GOC DataBase Global Grid User Support a distributed monitoring tool designed for Grid systems International Grid Trust Federation Joint Security Policy Group kilo SPECint2000 unit Logging and Bookkeeping service LHC Computing Grid Large Hadron Collider Metrics Implementation Group Memorandum of understanding National Grid Initiative National Research and Education Network Operations Advisory Group Operational Coordination Centre Open Grid Forum

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

15 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Acronym OMC OSCT OSG PEB PMnn RC ROC SAM SAM ADMIN SFT SL4 SLA SLD SSC SRM TA TCG TPM VO VOMS WMS

Meaning Operations Management Centre Operational Security Coordination Team Open Science Grid Project Execution Board Project Month nn of EGEE where nn is 01... 24 Resource Centre Regional Operations Centre Service Availability Monitoring Administrator interface to SAM Site Functional Test Scientific Linux version 4, a Linux distribution Service Level Agreement Service Level Description Security Service Challenge Storage Resource Manager Technical Annex Technical Coordination Group Ticket Process Manager Virtual Organisation Virtual Organisation Management System Workload Management System, a gLite product

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

16 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

2. EXECUTIVE SUMMARY AND OVERVIEW OF PRODUCTION SERVICE
In this summary, the general overview of the service status is presented for the period between January 1, 2007 and October 1, 2007. The EGEE production service has continued to expand in the second year of EGEE-II, in all of its aspects: number of sites, available processors, but more importantly the usage has grown significantly with more than 1,000,000 jobs per week being run in September 2007, or on average more than 100,000 jobs per day, an increase of almost 100% compared to January 2007. Updates to the gLite middleware stack 3.0 were distributed on a regular basis and in summer 2007, the first release of gLite 3.1 was released, supporting the SL4 OS. The first release only supported a limited number of services, but gradually more services were added with new updates. Now almost all services are supported. This upgrade was performed with no disruption to the production service. Middleware upgrades are delivered as component updates, with no more big-bang changes. Figure 1 shows the continued increase in the number of sites participating in the production service. The number of CPUs available to VOs only shows an increase towards the end of the period displayed. The number of CPUs is a very dynamic property; CPUs continuously will be replaced by newer, more powerful, ones. However, the increase in numbers continued for the months following the period displayed.

Figure 1 Evolution of Sites in EGEE and evolution of CPUs available These figures are compared to those committed in the EGEE-II Description of Work (DoW). Table 1 shows for each regional federation the actual number of CPUs provided by the partner institutes at almost the end of the 2nd project year (early 2008), and those committed at the start of the project, April 2006. Only for the start of the project commitments in numbers of CPU were given, so no direct comparison is possible as commitments for the second project year were only given in kSI2k units. In section 3.2.2 a more detailed analysis of the current compute power is given, and a comparison is made for the commitments in kSi2k units. For the most part, the partners have provided now significantly more than the commitment as given for the project start, although there are partners where the commitment in CPU still is not fulfilled.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

17 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Table 1: Partner provided CPU January 2008
Partner DoW CPU commitment (April 2006) 1800 1252 1852 2280 2010 1163 1860 1289 898 445 801 15650 Partner actual CPU 7028 12240 4364 7998 5376 2162 3555 3512 1652 888 1177 49887 non-partner contribution 22% 14% 22% 0% 48% 36% 24% 0% 26% 16% 51% 23%

ROC CERN France De/CH Italy UK/I CE NE SEE SWE Russia A-P Total

Total CPU 9048 14225 5632 7998 10402 3243 4700 3512 2254 1052 2412 64478

Twenty three percent of the CPU resources are provided by non-partner sites, which is almost the same percentage as last year (24%). Table 2 shows in more detail where the CPU made available fall below the commitments by more than 20%. The commitments are again for the start of the project, April 2006, while the actual numbers are for early 2008. The list has decreased since last year and in some cases, the shortfall is not significant and can easily be explained by the fact that actual CPU power has increased and can be provided by fewer systems. NIIFI falls short because they have Sparc-III and IV processors (212 in total) which are not supported yet by gLite. UKBH is a Nordic ARC site, for which it is yet not possible to interface from the EGEE infrastructure. UiB is upgrading currently their environment. UPV is actually offering 82 CPUs but some are part of a second cluster for which the CPUs are not published in the EGEE information system, but these CPUs can be used by EGEE jobs. In addition, for KISTI the actual number also seems to be higher, 35. So, except for UKBH, all partners are exceeding or effectively matching their commitments. Table 2: Partners not fulfilling DoW CPU commitments in January 2008
ROC CE NE SWE A-P Partners not meeting commitments NIIFI UKBH UiB UpV-GryCAP KISTI DoW CPU commitment 80 400 50 56 50 Actual CPU 12 0 32 20 27

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

18 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Table 3 lists the number of resource sites provided by the partners and the total number of sites for each region at the end of January 2008. The total number of sites is around twice that provided by the project partners in SA1, with ~75% of the resources provided by the partners’ sites. Table 3: Number of countries and sites supported by each ROC in January 2008
ROC CERN France De/CH Italy UK/I CE NE SEE SWE Russia A-P Total # partner countries 1 1 2 1 2 7 6 7 2 1 2 32 Total countries 6 1 2 1 2 7 8 9 2 2 8 48 #partner sites 1 11 6 38 7 18 8 36 11 8 2 146 Total sites 14 15 15 38 26 25 28 39 19 14 21 254

More than 30 million jobs have been run in the first nine months of 2007, with a continuous increase observed. This is almost a doubling compared with the first project year of EGEE-II (18.9 million jobs for 12 months). Figure 2 displays the number of jobs per month, with three different sets of VOs: those related to operations (OPS: monitoring jobs, site quality testing), LHC VO jobs which represent the largest fraction, and jobs from other VOs. More detail can be seen in the following chapters, but it is important to note that the levels of jobs run by other VOs also is doubled in these nine months, with almost 20,000 jobs each day at the end.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

19 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

5000000 4500000 4000000 3500000 3000000 # jo s b 2500000 2000000 1500000 1000000 500000 0 jan-07 feb-07 OPS LHC Non-LHC

Mar 07

apr07

May 07

jun-07 jul-07

aug07

sep07

Figure 2: Jobs run on a monthly basis. Based on information from accounting portal [R 9] More and more VOs make significant usage of the infrastructure, with more than 40 having used over 1 CPU-year per week over extended periods during the first nine months of 2007. This number is to be compared with 26 during the previous period. For data management a smooth introduction into production of the SRM 2.2 interface implementation took place in 2007. This is now successfully deployed by the major data management centres. Aggregate data rates of around 1 GB/s are seen for days. Monitoring of the availability of most of the core services has been fully implemented for all sites. One of the most successful tasks of SA1 is the Grid Operator on Duty (“COD” for historical reasons), which is a distributed effort where staff from 10 of the 11 ROCs each take 1 week as the duty operator team. Together with the tools used to monitor the status of the sites, and the GGUS ticket and support infrastructure, this activity is crucial in maintaining the usability and stability of sites. The results of the Site Availability Monitoring (SAM) tool are used by applications to select “good” sites according to their own criteria. The tool allows an application to define and run its own site tests. The user support activities have continued to improve response times for ticket processing; the average response times show a decreasing trend. In the areas of introducing Service Level Agreements, much progress has been made in the second year. The working group for this activity has produced a first, widely accepted Service Level Description (SLD) for sites in production. This document is now the basis for the federations to sign SLDs with their sites. Interoperability and interoperation is now well established with Open Science Grid, with at least some of the LCG VOs relying upon it as part of their computing strategy, with jobs being submitted from EGEE sites onto OSG resources. Interconnections and procedures for user support and operational problem solving have been defined and implemented.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

20 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

For the SA1 activity, several objectives have been defined and the services that are implemented to achieve these objectives are considered in this document. Most of these services are not described extensively here, but references are given for detailed information. Major changes in the set up of these services or new implemented services are discussed in more detail. A table summarizing the results for the different services SA1 proposed to implement is in the appendix, see Table 5. Many monitoring tools are used in EGEE; standard sets of tools are used by the Grid Operators and support staff, while different ROCs and sites depend upon different fabric monitoring tools. Whilst perhaps not ideal, it is clear that the essential autonomy of sites requires that they be able to use the most appropriate monitoring for their situation. In autumn 2006, a monitoring working group was started to draw together the work that has already been done in many areas of monitoring. One of the essential goals is to provide the information and results from the top level monitoring tools (Site Availability Monitoring, Information System checks, etc.) directly into the sites’ local fabric monitors. Sites can now add information from the SAM tests directly into local monitoring tools like Nagios. This working group also has a goal of harmonising in a standard way the publication of information from monitoring tools, so that the information may be harnessed and presented in standard ways, no matter what its origin. This group has made much progress and the results are maintained at a central location[R 5]. Where possible, metrics as defined in the milestone document MSA1.1, published at the start of EGEE-II, are used to assess the results for the different services. With respect to the size of the production infrastructure, such as number of sites, available compute power, storage facilities, and number of VOs, we conclude that the project is very successful and still growing.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

21 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

3. SA1 BASIC PRODUCTION SERVICES
3.1. INTRODUCTION The purpose of this document is formulated in the DoW document [R 1] as follows: As a basis for planning and monitoring improvements to the reliability and stability of the service, several deliverables (DSA1.4, DSA1.6) are proposed to assess the status of the service. These will use a set of metrics to be defined in MSA1.1 and take as a baseline the final service assessment from EGEE (DSA1.8). The metrics defined in MSA1.1 will also be reported on in the quarterly reports At the end of the first project year “DSA1.4:Assessment of production service status” [R 2] was published and the current document builds on that by examining the status of the project at October 1, 2007 and evaluating the changes since writing DSA1.4 at the beginning of 2007. The date of October 1 was chosen to avoid discussions on which data should be taken into account, e.g. requests for updating figures with more recent data. The same list of services as in DSA1.4 will be discussed in the following paragraphs, but mainly new developments will be considered and no detailed description of the services will be given as these can be found in DSA1.4 or other references. For the discussion the metrics as defined in MSA1.1 will be used and we will refer to the acronyms defined therein for particular metrics (e.g. size.1 for number of sites and CPUs in production) if applicable. The Metrics Implementation Group (MIG), an SA1 working group, maintains an overview of how information on the defined metrics can be retrieved and aims at the implementation of missing parts [R 5]. This is the basis for the information that is used in this document. In the next section, the overall status of the production infrastructure is presented. In the subsequent sections, the status of the basic production services supported by SA1 is presented, also supported by metrics results if available. These services are: core infrastructure services, grid monitoring and control, middleware deployment and introducing new resources, resource and user support, and grid management. In chapter 4 the status of the supporting services is given: international collaboration, capture and provide requirements, and long-term sustainability. In the final chapter, the conclusions and recommendations are presented. 3.2. OVERVIEW OF PRODUCTION INFRASTRUCTURE In this section, we describe the size of the infrastructure based on the certified sites of the production service. We do not count sites that are part of the pre-production service or that are still in the acceptance phase. Sites get certified by the ROC for the region the site belongs to, following rules as given by the registration policy and procedure document [R 6] and by testing the site [R 7], [R 8]. 3.2.1. Sites in production In Figure 3 the development since January 1, 2007 of the number of sites per region is presented. The numbers are based on data for the first day of each month as provided by the information systemmonitoring tool – Gstat [R 12]. The total number of sites has increased from 212 to 243, an increase of almost 15%, but most of this increase was before summer 2007. Therefore, it appears that some kind EGEE-II INFSO-RI-031688 © Members of EGEE-II collaboration PUBLIC 22 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

of saturation has been reached. From the numbers for the different regions it can also be seen that for some regions the numbers of sites almost was stable for this whole period.

Figure 3 Number of sites in the EGEE-II infrastructure, per federation from January 1 through October 1, 2007. These data relate to the size.1 metric. Information is taken from [R 12]. 3.2.2. Compute power Figure 4 shows the expected compute power available in the infrastructure at different times in the lifetime of the project. This forecast is from the DoW [R 1] and is based on the numbers given by the federations in kSI2k units. Information on the actual available compute power is published by the Gstat on a daily basis [R 11]. The total number of CPUs per site and region is published. No conversion to kSI2k values is done, so a direct comparison with the numbers from Figure 4 is not possible.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

23 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Figure 4 Expected evolution of computing resources over the course of EGEE-II in kSI2k units In Figure 5 the results are presented for the number of CPUs counted per region. The maximum number is for the period preceding the monitoring date, e.g. for January it is the highest number of CPUs ever counted before January 2007 (the historical maximum). CPUs in production is the number as counted on the day of measurement, e.g. the first of January for January. The period is from January 1 through October 1, 2007. These numbers are based on the historical data from Gstat [R 12]. The maximum number of CPUs drops in the middle of this period, from more than 50.000 in February to almost 40.000 in the summer. This can be explained by the fact that sites, or clusters behind a particular CE, were phased out (an historical maximum never can become lower for a site, so a decrease can only be explained if sites are taken out of production). Also occasionally, no number is published, even if the site was in production at that time. At the end, the maximum rises again to higher numbers than at the beginning. The actual number of processors in production as seen at the time of monitoring (top part of figure) fluctuates more than the maximum, as may be expected, and is always lower than the maximum. This shows the dynamic behaviour of the infrastructure, sites take clusters, or parts, out of production, replace CPUs etc. In addition, the compute power of CPUs continuously will increase, something that is not reflected in the numbers alone. To make at least some kind of comparison between the kSI2k numbers of Figure 4 for March 2007 and the numbers for maximum CPUs on April 1st, 2007, these latter numbers have been converted to kSI2k by multiplying with 1.5 (1.5 kSI2k per CPU probably is a conservative number). The results are presented in Figure 6. All federations, except the Russian, exceed their commitments.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

24 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Figure 5 Number of CPUs in the EGEE-II infrastructure per federation. These data relate to the size.1 metric. The information is from [R 12]

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

25 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

UKI SouthWesternEurope SouthEasternEurope Russia NorthernEurope Italy GermanySwitzerland France CentralEurope CERN AsiaPacific 0 5000 10000 15000
DoW target for March 2007 [kSI2k]

Actual compute resources at the end of March 2007 [kSI2k]

Figure 6 Comparison of expected compute resources from the EGEE-II DoW for March 2007, and the real compute resources at that date. The latter is derived from the maximum CPU numbers for March 2007 times a factor 1.5 for the conversion of CPU numbers to kSI2k

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

26 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

3.2.3. Available and used storage The available storage as published by sites in the information system is also published by Gstat, see Figure 7. Only at the end of the period, we notice an increase in available storage.

Figure 7 Total storage available in the EGEE-II infrastructure in terabytes. These data relate to the size.4 metric. The information is from [R 12] In Figure 8 the used storage is shown as given by the data from [R 12]. Here we also see an increase at the end of the period. This increase is expected to continue as the High Energy Physics VOs will be starting large scale production by 2008 (In December 2007 we already see 48PB of storage available, a more than doubling of the October 1 value). The storage available to each VO in the production service, which corresponds to the size.5 metric, is not trivially available from the various sources. In the above data, taken from the information system, disk and tape are not distinguished, which means that some large values (e.g. CERN) include the tape storage.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

27 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Figure 8 Total storage used in the EGEE-II infrastructure in terabytes. These data relate to the size.4 metric. The information was taken from [R 12] 3.2.4. Number of countries In October 2007, the number of countries from which sites are registered and managed by EGEE operations was 32 and there were an additional 15 countries with sites that are integrated with the EGEE infrastructure. These numbers are obtained from the regional information about sites [R 11]. Within the core number of 32 are a number of countries that are part of the infrastructure through the associated projects Balticgrid and SEEgrid. The number of countries is the size.8 metric from [R 4]. 3.2.5. Number of active Virtual Organisations The number of active VOs is given in Figure 9 as retrieved from the accounting portal [R 9]. The number of official registered VOs increased from just above 100 in March 2007 to more than 130 by the end of January 2008 (see [R 10]). Comparison to the number of VOs as advertised in the BDII (293) shows that a significant number of VOs have not registered yet. In the first year of EGEE-II there were on average each week 10 VOs that used more than 1 year of CPU time (the lowest part of the columns in Figure 9). This number hardly increased for the current period displayed. However, from the detailed tables in the accounting portal it was seen that for the previous report there were in total 26 VOs that passed this limit in the reporting period, while now we see that more than 40 VOs passed this limit at least for one week. This shows again that the load from VOs has spread over time and that the number of active VOs is still growing.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

28 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Figure 9 Number of active VOs (metric size.6)

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

29 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

3.2.6. Resource usage The usage of CPU resources is summarized in Figure 10 and Figure 11.

Figure 10 Normalized CPU time by region and project month for the first nine months of 2007. These data relate to the size.2 metric. Image taken from [R 9]. In Figure 10 the CPU usage is shown from the perspective of the resource owners (aggregated for the regions) while in Figure 11 it is shown from the perspective of the users (aggregated for the VOs). The normalization factor used is the SpecInt2000 value as published by the sites in the information system. So absolute values can be somewhat wrong due to errors in the published values, but the trends should be correct. The numbers are for all VOs, the “Other VOs” part also contains non-EGEE VOs, as can be seen from the totals, which in both figures are the same. The figures show that there still is a steady increase in the consumed CPU time. Some of the larger VOs show considerable fluctuations, due to specific challenges run for these VOs. For most regions, the load is rather stable or increases.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

30 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Figure 11 Production Normalized CPU time by VO and project month. These data relate to the size.3 metric. Image taken from [R 9]. 3.2.7. Job statistics The total number of jobs is growing steadily for this period, as can be seen in Figure 12. At the end of the period, more than one million jobs were run each week. The ops and dteam VOs are used for monitoring and testing the infrastructure. So a relatively large number of jobs is seen for these two VOs, but the load is low as can be seen from Figure 11. In Figure 13 a pie chart is shown of the division of the number of jobs per VO. This shows that a few VOs dominate the number of jobs run.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

31 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Figure 12 Number of jobs per month for the first nine months of 2007. This relates to the usage.1 metric. Source: [R 9]

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

32 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Figure 13 Number of jobs per VO for the period January – September 2007. This relates to the usage.1 metric. Source: [R 9]

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

33 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

The GridView tool [R 23] also shows job statistics, based on information from resource brokers (RB). For this class of jobs further details of state and failure can be obtained. Figure 14 shows the total number of jobs submitted. The total numbers are in agreement with the numbers seen in Figure 12. Only for the last month shown, September, the latter figure shows a decrease while the former shows an increase of the number. The explanation is that some VOs are using more and more so called pilot jobs; once started additional jobs are started under the control of the pilot job (but on the same nodes that have been reserved for the pilot job). These additional jobs are not seen by the RB and so not by GridView, but they are seen in the accounting repository. One can see this effect for instance for the ATLAS VO. While in Figure 12 for ATLAS the number of jobs for June and September is almost the same, it decreases in Figure 14 between June and September. Currently gridview also doesn’t process gLite-WMS scheduled jobs. It is planned to use L&B (Logging and Bookkeeping) information and then WMS based information will be used too.

Figure 14: Number of jobs per VO per month for the assessment period. This relates to the usage.1 metric. Source: [R 24] and is for jobs submitted through resource brokers on. Figure 15 through Figure 17 show state information for the jobs.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

34 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Figure 15 State-wise job distribution per month for the assessment period. This relates to the usage.3 metric. Source: [R 24]

Figure 16 Job success rate per VO for the assessment period. This relates to the usage.3 metric. Source: [R 24] In Figure 16 it can be seen that the success rate for applications/VOs can differ considerably. This can be expected as the complexity of jobs (dependency on different services, data dependency, local environment) can differ greatly between applications. Also some VOs use monitoring information (from SAM testing, see section 3.4) to detect if sites fail for their jobs, in which case these sites will not be selected for job submission for as long as the tests fail. From Figure 17 it can be seen that overall job success rates hardly vary over time. Apparently, it is difficult to achieve overall success rates well above 70%. Probably better feedback procedures on failures to grid middleware, as the one discussed above for site selection, is needed to improve on this.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

35 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

In Figure 18 the average processing time, the time a job is running at a site, is given on a monthly basis. The average is around 3 hours or higher, indicating that the number of long running jobs must be fairly high, as there will be also a large number of short jobs, e.g. for testing purposes.

Figure 17 Job success rate per month for the assessment period. This relates to the usage.3 metric. Source: [R 23]

Figure 18 Average processing time of jobs. Metric service.2. Source [R 24] 3.2.8. Number of active users The number of users registered for each VO can be counted. Additional information on their active usage of the infrastructure is not yet generally available, although there is a test implementation available. Users can be identified from the information in their X.509 certificates. However, this information is not publicly associated with the accounting information because of privacy protection. Table 6 in annex 6.4 gives the number of registered users for each VO that is hosted by a VOMS server on the EGEE infrastructure.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

36 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

3.2.9. Data services Figure 19 gives the average data throughput rate for all VOs on the infrastructure. The bulk of the data transported is dominated by the LHC associated VOs. Figure 20 gives the daily variation for November. Sustained data rates of more than 0.6 GB/s for a whole month are seen now and daily rates almost reach 1 GB/s.

Figure 19 Data throughput, metric usage.4 (monthly)

Figure 20 Example of average daily data throughput, metric usage.4 3.3. INFRASTRUCTURE SERVICES The objective is to operate a set of essential services, such as the information services, resource brokers, data management services, administration of the Virtual Organisations (VOs) and other core services agreed with the VOs, that bind distributed resources into a coherent multi-VO infrastructure. Core services are flagged as such in the GOCDB Error! Reference source not found. as an attribute of registered services of sites. For several core services, such as resource brokers, top-level information systems (BDII servers), data management services, and user administration (VOMS) services, more than one instance is present in the infrastructure in order to spread the load. Each region now operates at least one top level BDII and

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

37 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

most regions now run this service in high availability mode (e.g. using at least 2 nodes for this service). 3.3.1. Failover facilities Failover facilities are important for core services like the monitoring, management and collaboration tools used in SA1 operations. At the end of EGEE-I the activity started to plan, implement and manage availability-oriented replicas for several tools used in EGEE Grid operations [R 45]. The current status of the failover implementation is maintained at this reference. Failover is now available for GSTAT, SAM-ADMIN, and GridICE service, partly done for CIC portal and GOCDB, in test for GGUS, and yet to be done for SAM. The dependency of the different services on each other is given in [R 46] and this shows that many services rely on the GOCDB. Therefore, it is important that failover facilities for the GOCDB will be fully functioning soon. 3.3.2. Status of core and site services The availability of the services is monitored by GRIDVIEW [R 23]. Below are some examples of the results, see Figure 21 through Figure 23. The figures are for the LHC tier1 sites and have been implemented since June 2007 as can be seen from the figures. Results for other sites are now also available. Examples are both for a core service like the FTS service and a site specific service like the CE. The availability statistics are based on the results of the SAM tests described in the next section.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

38 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Figure 21 Availability of SRM service at tier1 sites. Metric service.5

Figure 22 Overall FTS availability on the left, example for one site on the right. Metric service.9

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

39 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Figure 23 CE availability at Tier-1/0 sites. Metric service.10 3.4. GRID MONITORING AND CONTROL The objective of this service is to monitor proactively the operational state of the Grid and its performance, initiating corrective action to remedy problems arising with either core infrastructure or Grid resources. The basic environment to test the health of the infrastructure is SAM, a collection of tests which gather information on how the infrastructure is functioning. 3.4.1. Monitoring the operational state Information on the operational state is given by the SAM test results [R 14]. A detailed description of SAM can be found in the deliverable DSA1.4 [R 2]. The GRIDVIEW examples given in the previous paragraph are based on SAM results. The development of new tests is an ongoing activity, e.g. tests for the availability of VOMS, MyProxy and L&B servers are needed. This also addresses EU review recommendation 42. There is a new tool that visualizes the status of the sites based on the SAM test results [R 25]. Sites are displayed in a map where sites are grouped by region and the size of the area that a site occupies is related to the number of CPUs. Figure 24 gives an example with all regions displayed, but other views are possible too, e.g. a single region or all tier1 sites. Sites that are ok are in green (dark grey), sites

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

40 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

with errors are in red (black), and sites in degraded state in orange (light grey). By going over the map with the mouse, information on the site for a particular rectangle is displayed. The size of a rectangle is proportional to the number of CPUs for the site. The tool is very convenient to get a quick overview of the status of the infrastructure.

Figure 24 GridMap example for all regions The SAM facilities are indispensable in monitoring and maintaining the health of the infrastructure. 3.4.2. Grid Operator on Duty (COD) monitoring It is the responsibility of Grid Operator on Duty (COD) teams to monitor and maintain the health of the infrastructure. The COD teams operate on a weekly basis and come from the different regions. The EGEE deliverable DSA1.8 [R 3] contains a description of the creation and the operations of the COD teams. In deliverable [R 2] details are given about the responsibilities and the operations of the COD teams and the duties of the COD teams are described in [R 15]. The test results from SAM are an important source of information for the COD teams. These results are inspected through an interface called the dashboard. The CIC portal [R 16] is the focal point for the COD operations. The operation of the COD teams guarantees that problems in core services or in resources at sites are discovered in time and that unsolved problems are escalated in time.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

41 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Weekly information on the ticket processing activity by the COD teams can be found on the CIC portal [R 36] and is displayed in Figure 25. For the first project year (April 2006 – March 2007) the average number of tickets was around 40, however with large fluctuations in the weekly totals and occasional peaks around 100. For the current period, the number of tickets created by COD operations is between 200 and 400 on a monthly basis. In section 3.6, ticket processing is discussed in more detail. 3.4.3. Status of grid monitoring and control Monitoring of grid services is functioning well. The groups responsible for the SAM and GridView development have made considerable progress. The addition of monitoring information for more services, like VOMS, MyProxy, and new services like AMGA is needed. Development for these additions is ongoing. 3.5. MIDDLEWARE DEPLOYMENT AND INTRODUCING NEW RESOURCES The objective of this service is to deploy middleware distributions from SA3 on Resource Centres throughout the Grid infrastructure. This involves close interaction and feedback with SA3 and the middleware activities both within and external to the project, as well as with the applications. Where new Resource Centres are to be incorporated into the Grid, assistance must be provided for both middleware installation and introduction of operational procedures. 3.5.1. SA3 interaction The goal of the SA3 activity is to manage the process of building deployable and documented middleware distributions. These distributions are delivered to SA1 for deployment in the preproduction and production infrastructure. More information on the responsibilities of SA3 can be found in the DoW [R 1] p. 292 onwards. SA3 delivers new releases and updates of the middleware to SA1. In 2007 26 update releases to gLite release 3.0 have been distributed, so on average, one every two weeks. Emergency fixes are distributed of course as soon as possible. [R 41] gives a history of gLite 3.0 updates. In June 2007, the first release of gLite 3.1 for SL4 was distributed. The first release only supported the WN and the UI nodes, but since then more services have been added. For a complete list of supported services, see [R 42]. Nine updates to this release have been distributed in 2007, so on average one every three weeks. It is the responsibility of the ROCs to distribute the release information to their region and to do further testing if needed because of dependencies on the local environment. Feedback can be provided through the ROCs and issues can be discussed at the bi-weekly ROC managers meeting and the weekly WLCG operations meeting. Individual problems encountered by sites can be reported through the general support facilities offered by GGUS. 3.5.2. Technical Coordination Group SA1 provides long-term input to SA3 through the TCG. The TCG brings together the technical activities within the project in order to ensure the oversight and coordination of the technical direction of the project, and to ensure that the technical work progresses according to plan. The full mandate of

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

42 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

the TCG is described at [R 17]. More information on the feedback that SA1 is giving to the TCG is given in section 4.2. 3.5.3. Introduction of new resources The procedures for the introduction of new resources are well-documented [R 6] and information about testing a new site is available [R 7]. We do not know of any major problem or deficiency in this process. The number of resources is growing steadily as discussed in paragraph 3.2.1. 3.5.4. Status of middleware deployment and introducing new resources This service continued to run smoothly over the assessment period. 3.6. RESOURCE AND USER SUPPORT The objective of this service is to receive, respond to, and coordinate the resolution of problems with Grid operations from both Resource Centres and users; this role filters and aggregates problems, providing solutions where known, and engage appropriate experts to resolve new problems. The portal operated by the GGUS service activity is the focal point for user and site support activities within EGEE-II. Users, site administrators, and support personnel can enter problems in the ticketing system maintained by the GGUS site [R 43]. Teams of support personnel (the TPM teams) assign tickets to the appropriate support channels, e.g. for a VO specific problem to the support contact of that VO or for a site problem to the site support contact. A list is maintained of available contacts [R 18]. Users also can contact regional (site or national) helpdesks for problems. Support personnel can handle these problems locally or use the GGUS facilities to involve other support channels. A steering group (ESC) has an important role in the development and maintenance of the GGUS facilities. The group gives advice on different aspects of GGUS, like CIC, ROC, and VO integration, process documentation, portal enhancements, training, development, and coordination. A detailed status of the GGUS activity together with statistics for the first project year can be found in the PM11 milestone MSA1.8 Assessment of GGUS support at [R 19]. Here we present some general statistics for the GGUS operation for the first 9 months of 2007. 3.6.1. Global Grid User Support statistics Several support metrics are defined in [R 4] for which metrics from the GGUS ticketing system can be used. In Figure 25 an overview of the number of processed tickets is given. Both newly created and solved tickets for different categories are given on a monthly basis (the number of solved tickets applies only to tickets opened in the same month, so this always will be lower than the number of opened tickets). The total number of tickets includes also the number of tickets related to network problems, the ENOC tickets in the figure. The ENOC collects tickets from NRENs, which agree to send them to the ENOC. There are now 14 NRENs, GÉANT2, NorduNet and the e2eCU sending their tickets to the ENOC. The ENOC forwards to GGUS the ones that seem relevant (possible impact on the grid infrastructure) and follows the tickets. Therefore, in the graphs EGEE plus ENOC tickets gives the total number of tickets. The COD tickets give that part of EGEE tickets that are created by the Grid Operator on Duty

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

43 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

service. As can be seen the number of processed tickets decrease towards the end of the period after a peak in May, despite the increase in the number of resources and number of jobs towards the end of this period. The reason for the peak in May is the introduction of new critical SAM tests for services. SA striking feature of Figure 25 is that the number of solved COD related tickets is only slightly smaller than the number of opened tickets for this category. This means that the time for resolution on average is small. On the contrary there is a larger gap for all EGEE related tickets (including the COD tickets), so for tickets from other submitters (users, site administrators) the resolution time on average is longer, so probably more difficult to solve.

Figure 25 Number of tickets, for different categories, for each month in the reporting period

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

44 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Figure 26 Average GGUS ticket response times in hours. A distinction is made between all tickets and tickets related to VOs, COD, ENOC, middleware, and other. These data relate to the user-support.5 metric. Over the reporting period, only information from April and later is available. Source: [R 21] Figure 26 gives statistics on the average response times for tickets (time between opening and closing a ticket), with a division into different categories. Most averages are below a week, however with some peaks well above this for VO related tickets. It should be investigated if there is a reason for this. From the assessment of the GGUS service for the first project year [R 19] it can be inferred from Table 20 in that document that for project months 6 and 9 the average response time was well above 400 hours. This would mean that a considerable decrease of this number has been reached in the second project year, to even less than 100 in September 2007. A recommendation in that document was also that the response time for tickets should be improved. It is concluded that the GGUS service is firmly established in the project, as can be seen from the number of tickets processed and the improvement on the response time. 3.7. GRID MANAGEMENT The objective of this service is to co-ordinate the implementation of the operational services of SA1 by the Regional Operations Centres (ROC), as well as to manage the relations with resource providers through negotiation of service-level agreements (SLAs), and relations with the wider Grid community through participating in standards bodies. It is difficult to measure the effectiveness of the organisation of each ROC. We have indirect indication that all ROCs are performing well from the fact that regions still succeed in adding new resource centres to the production infrastructure as shown in section 3.2.1.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

45 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

In the first half year of 2007, an internal SA1 review process of the status of each federation was completed. A summary of the results of these reviews is presented in deliverable DSA1.6 1[R 50]. In that document, more details on the operation of the ROCs can be found. 3.7.1. Service Level Agreement management In the previous assessment deliverable DSA1.4, it was reported that there was little progress on establishing SLAs between the ROCs and the production sites. Only sites that are part of the WLCG infrastructure were supposed to have signed a MoU with CERN [R 37]. Beginning 2007, a working group was formed to produce as a start a Service Level Description document. In the second half of 2007, progress was made by the working group with as a result a SLD document, which should be used by all ROCs as a basis for signing SLDs with individual sites [R 44]. The signing process started in December 2007, so no results can be presented in this document. 3.7.2. Operational Application Group The role of the OAG is to organize the introduction of new VOs into the production infrastructure. The registration process of new VOs has been simplified considerably compared with EGEE-I. Details can be found in deliverable DSA1.4 [R 2] . Since that time, some technical changes have been introduced. Technically, ending the registration of a VO is possible too; however, several policy documents must yet be updated accordingly. In 2007 20 new requests for registration were made (Q1 4, Q2 5, Q3 2, Q4 9), which shows that there still is interest from new communities to have access to the infrastructure. After registration, the VO must get access to resources and it is the role of the ROCs to assist in the resource allocation process. Until now this allocation of resources to new VOs has not worked very well as the ROCs do not own the resources and cannot force resource centres to give access to VOs. For the smaller VOs, it is very difficult to negotiate with the resource centres separately. In a meeting of the ROC managers in June 2007, this issue was addressed and it was agreed that new VOs in principle should find access to resources in their own region in the first instance. Secondly, some regions provide a small percentage (5 to 10%) of their resources to generally accepted VOs. This process is continuously followed by the OAG and will be further evaluated. 3.7.3. LHC Tier-1 integration At the end of 2004, the LHC user community started to set up service challenge activities [R 22] in order to prepare and test the data and compute facilities needed once the Large Hadron Collider starts operations. In the first year deliverable DSA1.4 [R 2], it was concluded that these services were fully integrated in the EGEE operations. The GridView utility [R 23], developed in the first place for monitoring the tier1 services is further expanded to include now also statistics on tier2 sites. Further development to include other EGEE sites as well is taking place. In addition, statistics on the availability of services for the past period is now available through this tool; see examples given in section 3.3.2. 3.7.4. Security activities Security activities are split in different tasks, coordinated by the Security Coordination Group [R 40]. The following responsibilities are defined for SA1:

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

46 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

1. Responsibility for maintaining the Security and Availability Policy and policies related to acceptable use by users and VOs. This is covered by the Joint Security Policy Group (JSPG) and will be discussed in more detail below. 2. Ensuring the continued existence of a federated identity trust domain, and encouraging the integration of national or community based authorisation schemes. This is performed by representation in the Policy Management Authority EUGridPMA and the worldwide federation IGTF and in the operation of CAs in different countries. 3. Analysis of security risks and vulnerabilities in the procedures and software. This task is performed by the Vulnerability Group led by the UKI ROC. 4. Responding to security incidents. This task is performed by the Operational Security Coordination Team (OSCT) and coordinated by the OCC. 3.7.4.1. Joint Security Policy Group The LCG Security Group was formed in 2003 and mandated to advise and make recommendations to the LCG Grid Deployment Manager and the LCG Grid Deployment Board (GDB) on matters related to LCG Security. With the start of EGEE in April 2004, it was agreed that the remit of this group would expand to meet the needs of both EGEE and LCG. From the early days there has been strong participation by Open Science Grid (OSG) in the USA with the aim of defining common policies across EGEE, OSG and LCG. Other Grid infrastructures have more recently joined the group, including DEISA, SEE-Grid and NDGF. The word "Joint" describes the fact that this body defines and maintains policy for several Grids. More information can be found on their website [R 32] By the end of EGEE phase 1, there was a reasonably complete set of security policy documents, but these were often worded in a project specific fashion or were too complicated and therefore not easily applicable to other Grid infrastructures. Work during EGEE-II has concentrated on the revision of existing policy documents to make them simpler and more general, using the word "Grid" rather than the name of specific projects such as EGEE. Several new areas requiring policy have also been identified. These include more extensive handling of VO responsibilities and data privacy issues related to user-level job accounting. The group has produced a set of documents [R 33], also shown in Table 4 together with their current status. Table 4 List of document produced by JSPG Document name Grid Security Policy Grid Acceptable Use Policy Grid Site Operations Policy Site Registration Policy Grid Security Logging and Traceability Status Released - recently revised Released Released - recently finalised Released Being revised - replaces Audit Requirements Policy

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

47 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

VO Operations Policy VO Registration Policy VO Membership Management Policy Approval of Certification Authorities Grid Policy on the Handling of Job Accounting Data

Close to approval Being revised – replaces VO Security Policy Being revised - replaces User Registration and VO Management Policy Being revised - to include new IGTF profiles New policy being prepared

3.7.4.2. Operation Security Coordination Team The Operational Security Coordination Team (OSCT) provides operational response to security threats against the EGEE infrastructure. The activity mainly focuses on the handling and resolution of computer security incidents, by providing reporting channels, pan-regional coordination and support. About 10 incidents were handled in 2007, from low to medium severity. The activity also deals with security monitoring on the Grid and provides best practice and advice to Grid system administrators [R 34]. The OSCT is led by the EGEE/LCG Security Officer and includes security contacts from each EGEE region. They are providing support for daily security operations as part of an on-duty rotational scheme. The team collaborates with other EGEE security groups, including the Middleware Security Group (MWSG), the Security Coordination Group (SCG) and the Joint Security Policy Group (JSPG). A strong collaboration also exists with other projects, including OSG and several NRENs. Another task is the scheduling and performance of service challenges to check the operational status of handling security incidents. In 2007, no project wide service challenge has been run, but there is one scheduled before the end of EGEE-II. The Security Coordination Group, part of the JRA2 activity, has published a milestone document in which the planning for the auditing of the security activities is described [R 47]. The security activities performed under SA1 also fall under the planned audits. The conclusion is that the necessary activities have been implemented to perform the different security tasks.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

48 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

4. SA1 SUPPORTING SERVICES
In this section, we discuss services not directly responsible for the production but that will support the establishment of a sustainable infrastructure. 4.1. INTERNATIONAL COLLABORATION The purpose of international collaboration, is “To drive collaboration with peer organisations in the Americas and the Asia-Pacific region to ensure the interoperability of Grid infrastructures and services so that the EGEE-II user communities, which are frequently international, are able to seamlessly access resources both within and outside Europe”. For the high-energy applications, global access to data and compute resources is a key element for their success, as groups from all over the world are involved in processing the data produced by the physical instruments. Also for other application domains, interoperation of Grid infrastructures can be important, because possible collaborations between application groups are not bound to infrastructure boundaries. Intensive collaboration exists between EGEE and the Open Science Grid project [R 27] in the United States. In the operations’ area, combined workshops have been organized and OSG attends the weekly operations phone meetings. In addition, the AP region is involved in the operations activity through the Taipei ROC operated by Academia Sinica. One of the results of the collaboration between EGEE and OSG has been the creation of the OPS VO, which is used for running the SAM tests on sites. Both EGEE and OSG accept the OPS VO, within EGEE the acceptance of the OPS VO is a requirement for sites to become certified. Interoperability has been established with OSG. The LHC experiments rely on this as part of its computing strategy, and are using it in production for more than a year. Also resources from related projects like Balticgrid and SEEgrid are tightly integrated with the EGEE infrastructure and these resources can be used by VOs form EGEE. There has not been much development in opening resources from other projects like DEISA in Europe, NAREGI (Japan) and TeraGrid (US). EGEE members are contributing to the development of international standards through participation in the working groups of Open Grid Forum (OGF). Important for the practical interoperability between infrastructures is a working group called Grid Interoperability Now (GIN). This working group started at OGF16 in Athens February 2006 [R 28]. The purpose of the group was described in the charter as follows: The purpose of this group is to organize and manage a set of interoperation efforts among production Grid projects interested in interoperating in support of applications that require resources in multiple Grids. The results of these interoperations may feed back into the interoperability efforts being conducted by the Standards Working Groups. The group identified four areas on which to focus to plan and implement interoperation, viz. data location and movement, authentication/authorization and identity management, job description and execution, and information services. The latest results were demonstrated at SC07, November 2007, in Reno in the United States[R 29] where EGEE also was present with a booth. Nine interoperability demonstrations were given. EGEE participated in a demonstration of interoperability between SRB and SRM, which are interfaces to data

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

49 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

management systems. Another demonstration related to data was WS-DAIR interface for the gLite AMGA Metadata Catalogue, developed within the EGEE project. The use of a common information system between different infrastructures was demonstrated, publishing information from nine production grids. This work is showing good progress and active participation from EGEE. 4.2. CAPTURE AND PROVIDE REQUIREMENTS The objective of this activity is to play a significant role in the capture and provision of middleware requirements. The management of the requirements is carried out by the Technical Coordination Group (TCG), but middleware requirements provided by SA1 are coordinated within SA1 to trace the status and priority for implementation. The TCG role is discussed in section 3.5.2. SA1 organized feedback by asking each region to provide requests for enhancements that were high on their list. Each region was then asked to give votes for the three highest priority items on the assembled list. The status is maintained in a list [R 30] and most items have been addressed now. 4.3. LONG TERM SUSTAINABILITY The objective is the long-term sustainability of the infrastructure: to work, both within the project and with the other related infrastructure projects and embryonic National Grid Infrastructures to put in place the necessary structures and organization to ensure a long-term sustainable infrastructure. Projects like EGEE-II are only funded for a few years, but for a sustainable infrastructure, there must be a basis for long-term support. Some years ago, discussions started to investigate if a future European infrastructure can be based on the federation of National Grid Infrastructures. At the EGEE’06 conference in September 2006 in Geneva sessions on this topic were scheduled [R 31]. In fall 2007 the FP7 EGI_DS (Design Study for a European Grid Initiative) project started [R 49]. At the EGEE’07 conference in October 2007 at Budapest, a workshop was scheduled by this project [R 48]. Many EGEE partners are involved with this activity, and many of the EGEE ROCs are involved in the national grid initiatives. Within SA1, there is commitment to support the initiative to build a sustainable infrastructure. SA1 deliverables related to the operations of the production environment have been given to EGI-DS as input.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

50 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

5. CONCLUSIONS AND LESSONS LEARNED
The EGEE production infrastructure has continued to expand and grow, and is delivering significant computing and storage capacity to more and more applications, several of which now depend upon EGEE for their main source of computing. It is important that in all future work, the key points of usability, reliability, robustness, and stability be always kept in mind as guiding principles of where priorities should be set. 5.1. RECOMMENDATIONS 5.1.1. Recommendations from DSA1.4 In deliverable DSA1.4 [R 2] some recommendations for improvement were given and progress on these is discussed here. Monitoring: Because of the efforts of the Metrics Implementation Group, the information on monitoring has been much improved. All information is now accessible from one single point [R 5] and for some metrics, additional tools have been developed. This work is continuing. Normalization of compute power on kiloSpecInt2000 (kSI2k) units: sites are advised to publish compute power in kSI2k values and information on how to obtain these values is available from the HEP community [R 51]. However, transition to SpecInt2006 values is needed as the kSI2k are not officially supported anymore. User Support: the main observation was that response times for GGUS tickets (the time between opening and closing a ticket) were too long. As discussed in section 3.6.1, these average response times have decreased considerably in 2007. Improve the resource allocation process for VOs: this is discussed in the lessons learned section. The capture of requirements by the ROCs and the provision of these requirements as input to the TCG: many requirements from SA1 are addressed by the TCG. However, refreshing the list of requirements from SA1 does not get sufficient attention and improvement is needed. 5.1.2. New recommendations Following are new recommendations as a result of the current assessment. Scale of infrastructure: from the statistics for the growth in the number of sites some kind of saturation for a number of regions can be seen while other regions still showed an increase during 2007 (section 3.2.1). More insight in the expectations of the number of sites that can join is needed, also to understand if there exists blocking factors for further growth. Failover for GOCDB: from a study of dependencies between core services the conclusion is that the GOCDB is critical for a number of other services. Fully functioning failover functionality for this service must soon be available. Ticket response times for VOs: the response times for tickets assigned to the support teams of some VOs are relatively long. It should be investigated if this can be improved.

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

51 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Success rates of jobs: to improve on the success rates the reasons for failures of jobs should be further investigated. The collection of error information can be helpful for this task, see references [R 52], [R 53] and [R 54]. 5.2. LESSONS LEARNED Middleware support: stability and reliability of the infrastructure is the key to the overall success of a distributed infrastructure like EGEE. It is important that releases of new middleware do not disrupt the operational state of the service. Therefore, the introduction of a continuous process of incremental updates of the middleware was a significant step to fulfil this requirement. The introduction of complete updates of the middleware showed to be very cumbersome; it is very difficult to manage such an upgrade in a short interval for all sites. It is also important that dependencies between different services do not rely on particular versions, e.g. older APIs should be supported as much as possible in newer versions. Resource provisioning model: It proved hard to implement an easy to deploy procedure for the allocation of resources to new user communities. The registration of new application groups as VOs is well established. However getting access to resources in a particular region is mostly a process of voluntary contribution if the VO is not directly associated with the region. Some federations, e.g. UKI, contribute a small percentage of their resources to VOs not directly supported within the region. A procedure is needed where the exchange of access to resources between different resource providers can be easily managed. With the emerging model of National Grid Initiatives (NGIs) contributing to a wider grid infrastructure there should be a model for the exchange of resources between the NGIs. For instance, NGIs should allow access to applications not directly associated with the NGI, where the access can be managed based on accounting information, so that on average usage from applications associated with the NGI at other NGIs is compensated by internal usage from outside. For the implementation of such a model, a high-level agreement between stakeholders is needed. Management of infrastructure: For maintaining the health of sites, the monitoring of sites by the COD teams is very important. For several reasons, like scaling and overhead involved, it would be better if sites or NGIs would monitor the state of sites more actively, and so reduce the workload of the COD teams. In principle, sites have the same tools for the detection of problems as the COD teams. Administrators also can subscribe to a notification service, based on SAM test results. The responsibility for monitoring the site could be part of the SLD to be signed by sites joining the infrastructure. There always should remain a project wide activity for monitoring the state of the production service; however, this could be more lightweight than it is today. This point is also mentioned in the deliverable DSA1.6 “Report on ROC progress and issues” [R 50].

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

52 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

6. ANNEXES
6.1. METRICS TABLE The current status of the implementation of retrieving metrics information can be found at: http://egee-docs.web.cern.ch/egee-docs/list.php?dir=./mig/production/&. 6.2. STATUS TOP LEVEL BDIIS Each region is supposed to support at least one top-level information system (BDII) in high availability mode. Below is the table with the status of the implementation at each region. 6.3. STATUS TABLE SA1 SUB-SERVICES Table 5 gives an overview of the status of the different sub-services SA1 has implemented to operate the production service. Table 5 Status overview SA1 sub-services Sub-service Core infrastructure services Objective/function To operate a set of essential services, such as the information services, resource brokers, data management services, administration of the Virtual Organisations (VOs) and other core services agreed with the VOs, that bind distributed resources into a coherent multiVO infrastructure. To monitor proactively the operational state of the Grid and its performance, initiating corrective action to remedy problems arising with either core infrastructure or Grid resources. To deploy middleware distributions from SA3 on Resource Centres throughout the Grid infrastructure. This will involve close interaction and feedback with SA3 and the middleware activities both within and external to the project, as well as with the applications. Where new Resource Centres are to be incorporated into the Grid, assistance must be provided for both middleware installation and introduction of operational procedures. Status operational

Grid monitoring and control

SAM tool operational GridView tool operational Grid Operator on Duty operational TCG operational Number of resource centres increased from 154 as reported one month before the start of EGEE-II to over 240 at PM 18.

Middleware deployment and introducing new resources

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

53 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

Sub-service Resource and user support

Objective/function To receive, respond to, and coordinate the resolution of problems with Grid operations from both Resource Centres and users; this role will filter and aggregate problems. To co-ordinate the implementation of the above points by the Regional Operations Centres (ROC), as well as to manage the relations with resource providers through negotiation of service-level agreements (SLAs), and relations with the wider Grid community through participating standards bodies.

Status GGUS FZK operational TPM teams operational ROC and national support teams operational OCC operational ROCs operational SLD available, not widely deployed OAG operational OSCT operational IGTF, EUGridPMA based identity trust domain operational Vulnerability Group active JSPG active

Grid management

International collaboration

To drive collaboration with peer organisations in the Americas and the AsiaPacific region to ensure the interoperability of Grid infrastructures and services so that the EGEE-II user communities, which are frequently international, are able to seamlessly access resources both within and outside Europe. To play a significant role in the capture and provision of middleware requirements. The management of the requirements will be carried out by the Technical Coordination Group (TCG), but middleware requirements provided by SA1 will be coordinated within SA1 to trace the status and priority for implementation of the requirements. To work both within the project and with the other related infrastructure projects and embryonic National Grid Infrastructures to put in place the necessary structures and organisation to ensure a long-term sustainable infrastructure.

GIN activity OSG collaboration OGF participation

Capture and provide requirements

Representation of SA1 in TCG is present and functioning

Long term sustainability of the infrastructure

Relations with OSG, Nordugrid, SEEGrid, Baltic Grid EELA, EUMedGrid. EUChinaGrid present

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

54 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

6.4. NUMBER OF REGISTERED USERS PER VO Table 6. Number of registered members per VO (as of March 2008). This corresponds to metric size.7. The data for this Table is taken directly from the CIC portal (including the “unknown” values), and the actual status of this metric is available at (http://cic.gridops.org/index.php?section=home&page=volist). Note that this Table lists active EGEE VOs only, i.e. VO’s with status ‘new’ and all non-EGEE VO’s are not includedhere

Virtual Organisation (VO) 

Number of registered users 
 

  Astrophysics,_astro‐ particle_physics:    Inaf  ams  argo  astro.vo.eu‐egee.org  astrop  auger  icecube  magic  pamela  planck  virgo  vo.apc.univ‐paris7.fr      Biomedical_and_Bioinformatic_Applications:    embrace  bio  biomed  gene  libi      Computational_chemistry:    compchem 

  Unknown  Unknown  23  4  Unknown  26  Unknown  23  23  30  17  9        Unknown  1  153  Unknown  14          45 

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

55 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

enmr.eu  gaussian  trgrida      Earth_sciences:    esr  trgridc      Finance:    egrid      Fusion:    fusion  rfusion      Geophysics:    eearth  egeode      High‐energy_physics:    alice  atlas  babar  belle  calice  cdf  cms  cppm  desy 

12  24  38          69  3          29          43  Unknown          Unknown  38          340  1603  23  1  1  1343  1126  11  20 

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

56 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

dzero  ghep  gridpp  hermes  hone  ific  ilc  ildg  lhcb  minos.vo.gridpp.ac.uk  pheno  photon  supernemo.vo.eu‐egee.org  uscms  vo.dapnia.cea.fr  vo.lal.in2p3.fr  vo.lapp.in2p3.fr  vo.llr.in2p3.fr  vo.lpnhe.in2p3.fr  vo.sbg.in2p3.fr  zeus      Infrastructure:    ops  dteam  edteam  eela  euindia  gilda  infngrid  pvier  rdteam  rgstest  swetest  vo.e‐ca.es  vo.grif.fr   

2  45  35  2  32  21  88  75  221  13  33  Unknown  12  Unknown  21  47  8  6  5  15  42          Unknown  615  40  99  42  1119  205  29  3  21  64  2  12   

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

57 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

  Others:    grid‐it  ngs  see  ukqcd  voce  aegis  apesci  astron  auvergrid  balticgrid  cesga  cosmo  crypto.swing‐grid.ch  cyclops  dech  diligent  enea  geant4  geclipse  geclipsetutor  gridcc  gridmosi.ici.ro  hungrid  imath.cesga.es  lights.infn.it  marine  ncf  nordugrid.org  proactive  seegrid  solovo  theophys  trgridb  trgridd  trgride  trgridf 

      Unknown  Unknown  Unknown  Unknown  Unknown  38  62  3  32  351  33  2  2  11  257  880  12  15  30  100  40  8  1  9  14  Unknown  11  45  9  1  Unknown  Unknown  98  17  20  5 

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

58 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

trgridg  twgrid  vo.agata.org  vo.gear.cern.ch  vo.ipno.in2p3.fr  vo.northgrid.ac.uk  vo.plgrid.pl  webcom 

3  75  4  2  24  6  1  2 

Table 7. Number of Compute Elements that support each of the VO’s. These numbers were obtained from the top-level BDII in March 2008. Only VOs supported by 10 or more CE’s are listed. The remaining VO’s are listed separately below the Table. Note that the total number of VO’s extracted from the BDII is 293, i.e. much more than the 115 active EGEE VO’s listed on the CIC portal ops  dteam  atlas  cms  lhcb  alice  biomed  geant4  cdf  dzero  esr  zeus  ilc  infngrid  hone  babar  magic  compchem  see  planck  eela  mis  fusion  vo.gear.cern.ch  333  295  232  223  168  153  123  66  62  56  50  44  37  36  35  35  34  30  29  28  26  25  24  23                                                  osg  gridex  glow  unosat  sixt  sdss  ligo  ivdgl  gridit  gadu  fmri  engage  voce  nanohub  dech  calice  balticgrid virgo  na48  grase  gpn  pheno  osgedu  des  23 23 23 22 22 22 22 22 22 22 22 20 19 19 19 19 19 18 18 18 18 17 17 17                                                 argo  theophys  pamela  swetest  egrid  auger  t2k  fermilab  euchina  enea  seegrid  bio  inaf  gamess  star  libi  ingv  grow  gear  euindia  cyclops  dosar      17 16 16 15 15 15 14 14 14 14 13 13 12 12 11 11 11 11 11 11 11 10    

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

59 / 56

Doc. Identifier:

Assessment of production Grid infrastructure service status

EGEE-II-DSA1.7-v3-0.doc Date: 28/03/2008

VO’s supported by less than 10 CEs: nwicg, i2u2, gugrid, egeode, compbiogrid (9), twgrid, rgstest, pvier, mice, mariachi, ltwo, lsgrid, hgdemo, gridcc, belle, auvergrid (8), trgridg, trgridf, trgride, trgridd, trgridc, trgridb, trgrida, scier, sbgrid, nysgrid, ncf, minos, lights.infn.it, gridpp, eumed, dgtest, bgtut (7), tutor, supernemo.vo.eu-egee.org, rfusion, photon, minos.vo.gridpp.ac.uk, icecube, grif, ghep, geclipse, embrace, compassit, compass, cigi, cedar, camont, apc (6), rdteam, ngs.ac.uk, lpnhe, llr, litgrid, lal, ipno, gaussian, eearth, desy, dapnia, dans, biit, astro.vo.eu-egee.org (5), vlemed, totalep, miniboone, hungrid, hepcg, enmr.eu, emutd, edteam, ams (4), vlibu, vlefi, vldbi, sgdemo, phicos, ops.vo.egee-see.org, nwchem.vo.hellasgrid.gr, ngs, ildg, gitest, gin, cosmo, cesga, betest, becms, beapps (3), wisent, vo.southgrid.ac.uk, vo.scotgrid.ac.uk, vo.lal.in2p3.fr, vo.dapnia.cea.fr, vledut, textgrid, ralpp, proactive, ppj, omegac, minos.gridpp.ac.uk, medigrid, manmace, lofar, kerndgrid, ingrid, ific, hp, hermes, gin.ggf.org, g4med, crogrid, cppm, c3grid, astrop, astron, astrogrid, apdg, ail, aegis (2), ws, webcom, vo.u-psud.fr, vo.nanocmos.ac.uk, vo.lpnhe.in2p3.fr, vo.llr.in2p3.fr, vo.lapp.in2p3.fr, vo.ipno.in2p3.fr, vo.ipnl.in2p3.fr, vo.grif.fr, vocet, vo.apc.univ-paris7.fr, vi, uibktest, ufrj, tx, turbomole, tigre, theory, switch, supernemo, solovo, sno, skgrid, sga, scope, sbg, sbc, ridgrid, psud, prtms, pragma, plgrid, pdc, patriot, opssgm, nw_ru, numi, nova, naokek, mipp, md, marine, lqcd, localUsers, lapp, ktev, kg, iusct, itut, itest, iplanck, intec, infnfgrid, in, imon, imath.cesga.es, imain, ifusion, ienvmod, ibrain, hypercp, gridmosi.ici.ro, gene, geant, gd, fermilab-test, estonia, elis, egeebme, dt, cta, crypto.swing-grid.ch, cdms, camont.gridpp.ac.uk, c3, bphys, biomedg, biomath, bfg, bes, atlasj, astro, as, apesci, agata, ad, accelerator (1).

EGEE-II INFSO-RI-031688

© Members of EGEE-II collaboration

PUBLIC

60 / 56