Kill a running job (’bkill 0’ kills all the job one user submits)
bqueues
Displays information about queues
bhosts
Displays hosts and their job condition.
lshosts
Display hosts and their resource condition (cpu/memory).
lsload
Display host and their load condition (cpu/memory).
?bsub %bsub -o [fileName] : Save bsub standard output into the log file (for debug). %bsub -e [fileName] : Save bsub standard error into the log file (for debug). %bsub -n [number] : Occupied specified number of processor to run the job.
%bsub -q [queueName] : Run the job on the specified queue.
%bsub -m [hostName] : Run the job on the specified host.
%bsub -P[projectName] : Declare which project the job is for. %bsub -Is : Submit a batch interactive job and creates a terminal with shell mode Be sure to use this option if you want to start up application with GUI %bsub -R : Runs the job on a host that meets the specified resource requirements
For more detailed information, please check “man bsub”
Some examples.
====
Job need 4 cpu slots, every slot need 100G memory (Total 4*100=400G memory). 任务属于项目 PJ123,需要 4 个 cpu 核,为每个核预占 100G 内存(默认按内存 cpu 核分配,而不是按照 job 分配),整体预留 400G 内存。bsub -P PJ123 -n 4 -R "rusage[mem=100000]" "COMMAND"
2. Job need 4 cpu slots, and the 4 slots must be on the same host. 任务属于项目 PJ123,需要 4 个 cpu 核,且要求 4 在同一台机器上设置个核( “span [hosts=1]” 在此前提下,机器上总共有剩余 100G 内存任务可以成功投递)bsub-P PJ123-n 4 -R "span[hosts=1] rusage[mem=100000]" "COMMAND"
3. Submit job into pd queue, select the hosts who have 500G memory, 100G swap, 100G tmp. 任务属于项目 PJ123,需要交付 pd 队列要求机器剩余内存大于 500G,剩余 swap 大于 100G,剩余 tmp 空间大于 100G。(select 能够选择当前符合条件的机器,但不能预占机器上的资源 rusage 预占资源)bsub-P PJ123-q pd -R "select[mem>=500000 && swap>=100000 && tmp>=100000]" "COMMAND"
4. Submit job into pd queue, need 8 cpu slots, reserve 100G memory, 100G swap, select the hosts who have 100G tmp. 任务属于项目 PJ123,需要交付 pd 队列,需要 8 个 cpu 核,选择 tmp 空间大于 100G 机器,预占 100G 内存和 100Gswap。bsub-P PJ123-q pd -n 8 -R "rusage[mem=100000:swap=100000] select[tmp>=100000]" "COMMAND"
====
?bjobs % bjobs : Display job list of current user %bjobs -UF[jobId] : Display the detailed job information.
Yu can get job PEND reason and job memory/swqp usage with this command.
•bkill % bkill -r [jobId] : Kill specified job by force. % bkill 0 : Kill all the jobs.
•bqueues
% bqueues : Display job conditions on all queues.
Job limitation, total job number, RUN job number, pend job number, SUSP job number.
•bhosts
% bhosts : Displays hosts and their job condition.
Max job limitation, total job number, RUN job number, pend job number, SUSP job number.
•lshosts
% bhosts : Display hosts and their resource condition (cpu/memory).
cpu/memory/swp resource.
•lsload
% bhosts : Display host and their load condition (cpu/memory).
cpu/memory/swp/tmp load.
2. openlava 配置策略
2.1 基础配置
openlava 配置的核心是 lsb.queues,也就是队列的配置。
先看一个基本的队列设置。
Begin Queue QUEUE_NAME = normal 队列名称 FAIRSHARE = USER_SHARES[[default,1]] 竞争策略 UJOB_LIMIT = 48 每个用户最大可用 slot 的数目 RUNLIMIT = 168:0 job 最长运行时间限制,单位为 小时:分钟 QJOB_LIMIT = 1500 queue 中最大 job 运行数目限制 HOSTS = normal 队列中机器设置(此处 normal 为 host 组的名称) DESCRIPTION = Default queue, for normal jobs. 队列描述 End Queue