Why projects? Why not?
One of the questions I often ask myself is “why aren’t more sites using projects?”. As I wander from forum to forum, I regularly see people saying, “I want to consolidate three [application server] instances on my system”—or two [database] instances or n applications. Many of these applications need to run with identical credentials (user id, group id, authorizations, privileges, etc.) and are only distinguishable by their working directory, environment variables, or the like. Reading these requests is a bit frustrating, as this scenario is one of the key motivations we had when introducing the project
(4) database—and I can only conclude that it’s my failure to really communicate its utility.
Projects let you assign a label with a specific workload. In S8 6/00 and all subsequent releases, you can explicitly launch a workload with its appropriate project using the newtask
(1) command. If extended accounting has been activated using acctadm
(1M) with one of the standard record groupings, then the processes within that workload will include their project ID. Writing an accounting record on every process exit can impact some workloads, so you can optionally choose to only write records when every task exits. A task is a new process collective that groups related work within a workload (so it could be a workload component, like a batch submission). acctadm
(1M) will report on the current status of the extended accounting subsystem, if invoked without arguments:
$ acctadm Task accounting: inactive Task accounting file: none Tracked task resources: none Untracked task resources: extended Process accounting: inactive Process accounting file: none Tracked process resources: none Untracked process resources: extended,host,mstate Flow accounting: inactive Flow accounting file: none Tracked flow resources: none Untracked flow resources: extendedThe resource line is reporting what accounting resource groups and resources we can include in each record. We can expand the resource groups for each type of accounting using the
-r
option.
$ acctadm -r process: extended pid,uid,gid,cpu,time,command,tty,projid,taskid,ancpid,wait-status,zone,flag basic pid,uid,gid,cpu,time,command,tty,flag task: extended taskid,projid,cpu,time,host,mstate,anctaskid,zone basic taskid,projid,cpu,time flow: extended saddr,daddr,sport,dport,proto,dsfield,nbytes,npkts,action,ctime,lseen,projid,uid basic saddr,daddr,sport,dport,proto,nbytes,npkts,actionSo we can enable the extended task record by invoking
acctadm
(1M) like
# acctadm -e extended taskacctadm -E task
acctadm -f /var/adm/exacct/task
In S10, you can optionally enable accounting without having it write to a file, such that the records are retrievable using
getacct
(2).
Of course, that’s all about accounting, but projects are useful even if you’re not interested in the long term resource
consumption of your workloads. The project ID is useful for isolating your workload using conventional /proc
-based tools
like prstat
(1M) and pgrep
(1), as well as with DTrace. For instance to see only one’s own projects,
you can use the -J option to pgrep
.
$ pgrep -lf -J user.sch 728069 /usr/bin/bash 728027 /usr/bin/bash 125169 /usr/bin/bash
To see workloads on the system, you can use
prstat
‘s -J option, which aggregates the activity by project ID, as well as
displaying the most active processes:
$ prstat -c -J user.sch 1 1 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 653322 xx 19M 17M cpu2 0 3 166:34:10 12% setiathome/1 911046 xx 19M 17M cpu5 0 3 170:28:53 12% setiathome/1 668697 xx 19M 17M cpu4 0 3 138:53:14 12% setiathome/1 100378 daemon 2352K 1944K sleep 60 -20 30:18:23 0.2% nfsd/5 125214 sch 4472K 4152K cpu3 1 0 0:00:00 0.0% prstat/1 100066 root 7872K 6736K sleep 29 0 2:20:42 0.0% picld/13 125169 sch 2768K 2416K sleep 1 0 0:00:00 0.0% bash/1 100156 root 91M 36M sleep 59 0 0:46:59 0.0% poold/8 100249 root 6680K 4848K sleep 1 0 8:02:46 0.0% automountd/2 100254 root 5776K 3552K sleep 59 0 0:00:01 0.0% fmd/10 100262 root 4024K 3424K sleep 59 0 0:19:40 0.0% nscd/57 100265 root 1248K 776K sleep 59 0 0:00:00 0.0% sf880drd/1 100184 root 2288K 1384K sleep 1 0 0:00:00 0.0% ypbind/1 100172 daemon 2680K 1704K sleep 58 0 1:07:32 0.0% rpcbind/1 100158 root 2216K 1336K sleep 59 0 0:00:26 0.0% in.routed/1 PROJID NPROC SIZE RSS MEMORY TIME CPU PROJECT 130 3 56M 51M 0.3% 475:56:17 37% background 0 61 341M 168M 1.0% 43:21:28 0.3% system 36565 4 13M 11M 0.1% 0:00:01 0.1% user.sch 105403 14 39M 30M 0.2% 0:00:03 0.0% user.xxxxxxx 77194 17 74M 62M 0.4% 0:03:10 0.0% user.xxxxxx Total: 133 processes, 279 lwps, load averages: 3.07, 3.07, 3.04
(This system’s pretty idle during our U.S. shutdown, so it’s doing its best to find extraterrestrial customers.)
To limit your DTrace predicates to only a project of interest, use the curpsinfo
built-in variable to access the
pr_projid
field, like
/curpsinfo->pr_projid == $projid && ..../
where I’ve also used the
$projid
scripting macro, which expands to the result of curprojid
(2) for the
running DTrace script. You could instead explicitly enter your project ID of interest, or use one of the argument macros
if writing a script you expect to reuse.
Projects also let you place resource controls on your workload, establish its resource pool bindings, and more. We’ll make it easier to use them with the forthcoming service management facility. But I’ll summarize: projects are a precise and efficient way to label your workloads (as opposed to pattern matching on arguments or environment variables). If you are consolidating workloads, either because of machine eliminations, organizational mergers, or other reasons, they are definitely worth considering. If you think there’s a way to make them more applicable to your work, please let me know.