幻灯片 1 - PRAGMA Cloud/Grid Operation Center

Download Report

Transcript 幻灯片 1 - PRAGMA Cloud/Grid Operation Center

CSF4 Meta-Scheduler Tutorial
1st PRAGMA Institute
Zhaohui Ding
zhding@ucsd.edu or zhaohui.ding@email.jlu.edu.cn
College of Computer Science & Technology
Jilin University
National Biomedical Computing Resource, University of
California, San Diego
Agenda
Meta-scheduler & CSF4 Introduction
CSF4 Architecture
CSF4 Functionalities
Future Work
Demo and Practice
2
What is Meta-Scheduler
Resource Allocation & Management
Heterogeneous
Distributed
Dynamic
Local Scheduler VS Meta-scheduler
3
Local Scheduler VS Meta-Scheduler
Local Scheduler
Meta Scheduler
Administrative
scope
Cluster, Single Domain
Grid, Multiple Domains,
Virtual Organizations
Hardware &
Software (OS)
Homogeneous
Heterogeneous
OS-independent
Data
management
LAN file system (NFS,
FTP, scp)
Global file system
(Gridftp, Gfarm)
Security
OS user/passwd, NIS,
ssh public key
Grid Security
Infrastructure(GSI)
Resource
Management
Protocol
Specified, Private
Protocols for different
local scheduler
Standard, Open, GeneralPurpose Protocols
(GRAM)
Scheduling
mode
Centralized
Centralized / Distributed
4
Meta-Scheduler VS Local Scheduler
 Local Scheduler
LSF (Load Sharing Facility)
PBS (Portable Batch System)
SGE (Sun Grid Engine)
Condor
IBM Loadleveler
 Meta-Scheduler
CSF
Maui (Silver)
Gridway
Nimrod-G
Condor-G
5
CSF4
 What is CSF Meta-Scheduler
Full Name: Community Scheduler Framework
CSF4 contains a group of grid services host in GT4
CSF4 is a full WSRF compliant meta-scheduler.
Open Source project and can be accessed at
http://sourceforge.net/projects/gcsf
Developed by Lab. of Distributed Computing and
System Architecture, Jilin University, China
CSF4 has been added to Globus Toolkit 4 as an
Execution Component
6
CSF4 in Globus Toolkit 4
7
A typical deployment
CSF4
Meta-Scheduler
Grid Site
GT2
LSF
Grid Site
GT2
PBS
Grid Site
GT4
SGE
……
Grid Site
GT2
Condor
8
What CSF4 Can Do?
Basic Functionalities
Submit jobs to Grid without Specifying Cluster
Monitor and Control Jobs
Provide Queuing Service
Schedule jobs and resource by custom-built
polices
CSF4 Portlet (A Web browser based User
Interface)
9
What CSF4 Can Do? (cont.)
Advanced Functionalities
Multiple Domains Resource Information Sharing
Multiple scale resource scheduling policies
Automatic user credentials delegation
Automatic data-staging
Extensible scheduling framework
Supporting grid parallel jobs (MPI&MPICH-G2)
10
CSF4 – Architecture
Grid Environment
CSF4 Services
Resource Manager
Factory Service
Meta Information
WS-MDS
GT2 Environment
Resource Manager
Gram Service
Reservation
Srevice
Resource Manager
LSF Service
Job
Service
GateKeeper
Queuing Service
WS-GRAM
GramFork
Local
Machine
GramPBS
GramSGE
GramCondor
gabd
PBS
SGE
: Adapter
Condor
GramLSF
LSF
: Local Scheduler
GramFork
Local
Machine
GramPBS
GramSGE
GramCondor
PBS
SGE
Condor
11
CSF4 – Architecture
User view
12
Local Scheduler And Infrastructure
Supported by CSF4
Local Scheduler Supported
LSF
PBS
SGE
Condor
Infrastructure Supported
Globus Toolkit 4
Globus Toolkit 2
13
CSF4 – Functionalities
Scheduling Plug-in Framework
Designed For Queuing Service
Provide A set of policies
Customizable
Extensible
14
Existent Scheduling Policies
 FCFS (First Come First Serve) round-robin
 Default policy
 Throttle
 Restrict the number of jobs in a scheduling cycle
 Array Job Plug-in
 Design for life science applications (such as AutoDock, BLAST)
 MPICH-G2 Plug-in (under-developing)
 The plug-in guarantee the synchronized resource allocation can
be successful
 Data intensive applications plug-in (under-developing)
15
Schedule plug-in & scheduling policies


Each policy is implemented inside a scheduling
plugin module
A queue can load multiple plugin modules
16
Resource Information Sharing
A MDS information provider for CSF4
Multiple CSF4 can share the resource
information
17
CSF4 – Functionalities (cont.)

Deploy Multiple CSF4 in a Grid Community
GDIA
Community Center
Upload/Download
Cluster Information
D
SGE 6.0
MDS
Upload/Download
Cluster Information
Negotiate
Upload/Download
Cluster Information
GDIA
MDS
GDIA
A
info
CSF4
B
MDS
info
CSF4
GDIA
MDS
C
info
CSF4
access
access
LSF 6.1
SGE 5.3
OpenPBS2.3.16
Condor
18
Array Job
AutoDock and Blast-like applications
A large number of sub-jobs.
Execute same binary
Different input/output files
19
Array Job (cont.)
Executable: autogrid4
Input: hsg.gpf.1
Output:hsg.glg.1
Executable: autogrid4
Input: hsg.gpf
Output:hsg.glg
Array Size: 100
Executable: autogrid4
Input: hsg.gpf.2
Output:hsg.glg.2
Submit
CSF4 Metascheduler
Split
..
..
..
Array Job
Executable: autogrid4
Input: hsg.gpf.100
Output:hsg.glg.100
 Advantages
Submit job only once
Save submission time and memory storage
20
Data Staging
Manual Data Staging
 Which clusters I can use?
 Which clusters my jobs will running on?
 Where is the output data?
 When will the job finish, so that I can stage-out
the output data?
21
Manual Data Staging
 Without Meta-Scheduler
User
Cluster
Output
Data
Manual Stage In
Input
Data
Submit Job
Manual Stage Out
Cluster
Cluster
22
Automatic Data Staging
 With CSF4 Automatic Data Staging
User
Gridftp
Input
Data
Cluster
Output
Data
Submit Job
CSF4 MetaScheduler
Submit Job
Cluster
Cluster
23
Integrate CSF4 with Gfarm
 With CSF4 Automatic Data-Staging and Gfarm
User
Gfarm
Create
Gridftp
Input
Data
Submit Job
CSF4 MetaScheduler
Submit Job
Output
Data
Input
Data
Output
Data
24
Application Based Scheduling
Cluster
Autodock
100 CPUs
Autodock
12 CPUs
NAMD
Other
Resource
Autodock
1990 CPUs
Cluster
User
NAMD
24 CPUs
……
CSF4 Meta-Scheduler
Scheduling Modules
Autodock
64 CPUs
NAMD
Other
Resource
BLAST
Autodock
1 CPU
Available Resource Lists
……
Cluster
Autodock
8 CPUs
Other
Resource
Resource
Requirements
Resource
25
CSF4 User Interface
 CSF4 Portal
26
CSF4 User Interface
 CSF4 Command Line
27
Under-Developing work
· Cluster Selection
· Resource Pre-Check
· Resource Re-assign
Virtual Resource Pool
CSF4
1
3
VJMgr
Cluster
Cluster
Cluster
LSF
PBS
SGE
vj
vj
vj
vj
5
4
2
Busy
Host
Busy
Host
rj
rj
rj
vj
rj
rj
vj: virtual job, rj: real job
28
Demo & Practice
 https://www.nbcr.net/pub/wiki/index.php?title=CSF4_Tutorial_PRAGMA13
29
Thank you
감사합니다
ありがとうございます
謝謝
谢谢
30