In most Unix environments, the startup process consists of a handful of autonomous boot scripts. They act independently of one another; unaware of what scripts have already run or which ones will run after them. When they are invoked, there is no serious error checking and no recourse if the script fails.
For Solaris 10, Sun introduced the Service Management Facility. SMF is a framework that handles system boot-up, process management, and self-healing. It addresses the shortcomings of startup scripts and creates an infrastructure to manage daemons after the host has booted.
A System V Unix host will start the
sendmail daemon with the script
S80sendmail from either the
/etc/rc3.d directory. The script contains commands to start or stop sendmail, depending its invocation. The
S portion of the filename denotes that this is a startup script, and the
80 is a sequence number that says when the script should run.
S80sendmail runs, it won't be aware of any previous problems such as a NIS failure or /var never properly mounting. You could write tests into the script, but that increases startup time and the complexity of each script.
In the SMF environment,
sendmail is a service. Solaris 10 defines a service as a persistent program that handles system or user requests. Services are expected to be fault tolerant and manageable by the operating system.
Services are identified by a URI known as a Fault Management Resource Identifier. The FMRI is broken up in a category hierarchy to help identify the service and what it is responsible for.
Here is the FMRI for
ssh, and other services running on a host:
svc:/network/smtp:sendmail svc:/network/ssh:default svc:/network/system/filesystem/local:default lrc:/etc/rc2_d/S99audit
Here is the breakdown of the FMRI structure:
Each service has a manifest that describes the service and its management needs. It lists the service dependencies, the control scripts, and the actions to take when the service fails. The manifest starts out as an XML file that SMF imports into a central repository, which records the properties of all the services.
Sendmail will not run without the following dependencies:
Services in the SMF environment start up in parallel, but each service will become available only when all its listed dependencies are. This means the host will have a faster boot-up, and it will reduce the chances of a cascading failure of services. There is no explicit order to service startup, so
sendmail or its dependencies could start up at any time.
Almost all services under the SMF are controlled by one service known as the restarter. The restarter controls the
svc.startd daemon, which in turn starts the other services, tests their dependencies, and restarts them if they fail. When Solaris 10 boots up,
svc.startd is one of the first programs spawned from
It's still possible to use
rcN.d scripts under Solaris 10; however, the programs started from these scripts will not be under SMF control. These are referred to as legacy run scripts. They have an FMRI, like normal services do, but the schema prefix is
lrc:. Legacy run scripts are not initialized until all SMF services are up and running. When the host shuts down, they are the first stop scripts run before the SMF services are disabled.
The two most common commands used to administer services are
svcs command reports on the state of configured services, while the
svcadm command controls the services.
$ svcs STATE STIME FMRI ... legacy_run Sep_22 lrc:/etc/rc2_d/S99audit ... online Sep_22 svc:/system/svc/restarter:default online Sep_22 svc:/system/filesystem/autofs:default online Sep_22 svc:/system/system-log:default online Sep_22 svc:/network/smtp:sendmail online Sep_22 svc:/system/filesystem/local:default online Sep_22 svc:/network/ssh:default online Sep_22 svc:/system/dumpadm:default online Sep_22 svc:/network/loopback:default ...
svcs without arguments lists all running (online) services. The
STATE column reports the service status; the
STIME refers to when the service state last changed; and the
FMRI identifies the service. If you want to list all services, not just those that are running, use the
svcs command can also examine a single service by using either a full or partial FMRI. You can add the
-x options for extended output on the service. The
-d option will list all the dependencies of a service.
$ svcs svc://localhost/network/ssh:default STATE STIME FMRI online Sep_22 svc:/network/ssh:default $ svcs -v svc:/network/ssh STATE NSTATE STIME CTID FMRI online - Sep_22 52 svc:/network/ssh:default $ svcs -x network/ssh svc:/network/ssh:default (SSH server) State: online since Thu Sep 22 07:51:15 2005 See: sshd(1M) See: /var/svc/log/network-ssh:default.log Impact: None. $ svcs -d ssh STATE STIME FMRI online Sep_22 svc:/network/loopback:default online Sep_22 svc:/network/physical:default online Sep_22 svc:/system/cryptosvc:default online Sep_22 svc:/system/filesystem/local:default online Sep_22 svc:/system/utmp:default online Sep_22 svc:/system/filesystem/autofs:default
You can add the hostname
localhost to an FMRI, or you can abbreviate it by removing the instance name and/or the categories. If the abbreviation results in multiple matches, they will all be listed. Here are two services that each have the name
local in the last segment of the service name:
$ svcs local STATE STIME FMRI online Sep_22 svc:/system/device/local:default online Sep_22 svc:/system/filesystem/local:default
You can also perform basic glob matching on service names:
$ svcs "*network*" STATE STIME FMRI disabled Sep_22 svc:/network/rpc/keyserv:default disabled Sep_22 svc:/network/rpc/nisplus:default disabled Sep_22 svc:/network/nis/client:default ..... online Sep_22 svc:/network/nfs/client:default online Sep_22 svc:/network/security/ktkt_warn:default online Sep_22 svc:/network/telnet:default online Sep_22 svc:/network/nfs/rquota:default $
Services can manage a running process or an OS state. By using the
-p option with
svcs, you can identify the processes associated with a service.
$ svcs -p svc:/network/ssh STATE STIME FMRI online Sep_22 svc:/network/ssh:default Sep_22 345 sshd
The time the process started is listed under the
In some cases, services do not have running processes associated with them. Tasks such as bringing a network interface up or mounting a disk partition do not require continuously running processes. The
svc:/system/filesyste/local:default service runs the mount command once to mount all local filesystems, and then the script exits. SMF refers to these as transient services.
$ svcs -p svc:/system/filesystem/local:default STATE STIME FMRI online Sep_22 svc:/system/filesystem/local:default
Finally, there are services that have running processes only when they are in use. When Sun designed the Service Management Framework, it merged the behavior of
inetd and the way it handles network daemons. All the daemons that previously appeared in the /etc/inetd.conf file are now SMF-managed services. The difference is that these services use the
inetd daemon as a starter, instead of
$ svcs -p rlogin STATE STIME FMRI online Sep_22 svc:/network/login:rlogin $ rlogin localhost Password: Last login: Sun Feb 19 23:49:56 from localhost Sun Microsystems Inc. SunOS 5.10 Generic January 2005 $ svcs -p rlogin STATE STIME FMRI online Sep_22 svc:/network/login:rlogin 23:50:41 23833 in.rlogind 23:50:41 23836 bash 23:50:48 23840 svcs $ exit logout Connection to localhost closed. $ svcs -p rlogin STATE STIME FMRI online Sep_22 svc:/network/login:rlogin
If you kill a process under the control of service management, the program that originally started it will restart it. Here's an example of an Apache2 service that has been running since January 5. First, I double-checked the service by grepping for the process IDs, which match the ones listed with the service. Then, I sent the
TERM signal to the parent of all of the child processes.
# svcs -p http STATE STIME FMRI online Jan_05 svc:/application/http:apache2 Jan_05 12377 httpd Jan_05 12378 httpd Jan_05 12379 httpd Jan_05 12380 httpd # ps -ef | grep http root 12377 1 0 Jan 05 ? 2:14 /opt/apache2/bin/httpd -DPERL root 23521 23520 0 20:33:01 pts/1 0:00 grep http http 12378 12377 0 Jan 05 ? 0:00 /opt/apache2/bin/httpd -DPERL http 12380 12377 0 Jan 05 ? 0:00 /opt/apache2/bin/httpd -DPERL # kill -TERM 12377 # ps -ef | grep http root 23527 23520 0 20:33:25 pts/1 0:00 grep http root 23580 1 0 20:33:09 ? 0:01 /opt/apache2/bin/httpd -DPERL http 23581 23580 0 20:33:10 ? 0:00 /opt/apache2/bin/httpd -DPERL http 23582 23580 0 20:33:12 ? 0:00 /opt/apache2/bin/httpd -DPERL http 23583 23580 0 20:33:12 ? 0:00 /opt/apache2/bin/httpd -DPERL # svcs -p svc:/application/http:apache2 STATE STIME FMRI online 20:33:09 svc:/application/http:apache2 20:33:09 23580 httpd 20:33:10 23581 httpd 20:33:11 23582 httpd 20:33:11 23583 httpd
I then rechecked for the
httpd processes to find that the svc.start daemon started new Apache servers. Then I examined the http service. It reported that the service time had changed, and listed the new process IDs.
The following table lists some SMF services, their associated processes, and their restarter FMRI:
If you want to know the restarter for a service, use
svcs -l. Use
svcs -R with a full FMRI to list all of the services a restarter service controls.
$ svcs -l network/ssh fmri svc:/network/ssh:default name SSH server enabled true state online next_state none state_time Thu Sep 22 07:51:15 2005 logfile /var/svc/log/network-ssh:default.log restarter svc:/system/svc/restarter:default contract_id 52 dependency require_all/none svc:/system/filesystem/local (online) dependency optional_all/none svc:/system/filesystem/autofs (online) dependency require_all/none svc:/network/loopback (online) dependency require_all/none svc:/network/physical (online) dependency require_all/none svc:/system/cryptosvc (online) dependency require_all/none svc:/system/utmp (online) dependency require_all/restart file://localhost/etc/ssh/sshd_config (online) $ svcs -R svc:/system/svc/restarter:default STATE STIME FMRI disabled Sep_22 svc:/system/metainit:default disabled Sep_22 svc:/network/rpc/keyserv:default online Sep_22 svc:/system/svc/restarter:default online Sep_22 svc:/network/pfil:default online Sep_22 svc:/milestone/name-services:default online Sep_22 svc:/network/loopback:default ....
Enable or disable a service using the
# svcs -x telnet svc:/network/telnet:default (Telnet server) State: online since Thu Sep 22 07:51:11 2005 See: in.telnetd(1M) See: telnetd(1M) Impact: None. # svcadm disable svc:/network/telnet:default # svcs -x telnet svc:/network/telnet:default (Telnet server) State: disabled since Sun Feb 19 23:32:40 2006 Reason: Disabled by an administrator. See: http://sun.com/msg/SMF-8000-05 See: in.telnetd(1M) See: telnetd(1M) Impact: This service is not running.
The configuration state of a service is recorded in the service repository, so changes to that state persist across reboots. If you disable
telnet, rebooting the host won't bring it back up. You must explicitly reenable it from the command line. Make a temporary change to the state of a service by adding the
-t option to
# svcadm disable -t network/telnet
There are six different service states for configured SMF services.
The process of starting or stopping a service is listed in the service manifest. Most services have a method script associated with them that handle starting and stopping the service, just like an rc script. The restarter service runs this script to bring the service online or offline.
svcadm command gives administrators a standard interface for controlling services.
svcadm recognizes several service management commands:
svcadm command is more picky about wildcards, unlike
svcs. You can still use abbreviated FMRIs and wildcards, as long as they match only one full FMRI.
# svcadm refresh svc:/network/login svcadm: Pattern 'svc:/network/login' matches multiple instances: svc:/network/login:rlogin svc:/network/login:klogin svc:/network/login:eklogin # svcadm refresh "svc:/*rlogin" # svcs "*rlogin" STATE STIME FMRI online 23:24:17 svc:/network/login:rlogin
Because rc scripts are no longer the preferred method used to manage programs, Sun has enhanced the runlevel model with service milestones.
In Unix, runlevel one is single user mode, two is multiuser mode, and three is multiuser mode with file sharing or network services. In each runlevel, there is a core set of services that must be brought online.
For example, levels one, two, and three all require a minimum amount of local filesystems to be mounted, and network interfaces to be online. Runlevel two requires all internet services to be online, and users must be able to log on to the host. Runlevel three requires everything level two does, plus the ability to share files by NFS.
Milestones are services that don't run any applications but do have a dependent list of services. Once those services are online, the milestone is marked online. The milestone ensures an expected group of services are up and running, so you don't have to check each individual service.
Here is a list of milestones currently online. In this case, seven milestones are online because they all had their dependencies met.
$ svcs "svc:/milestone/*" online Sep_22 svc:/milestone/name-services:default online Sep_22 svc:/milestone/network:default online Sep_22 svc:/milestone/devices:default online Sep_22 svc:/milestone/single-user:default online Sep_22 svc:/milestone/sysconfig:default online Sep_22 svc:/milestone/multi-user:default online Sep_22 svc:/milestone/multi-user-server:default
Here is a list of milestones and their equivalant rc levels.
||Network interfaces online|
||Basic system configuration|
||Any one of the NIS, NIS+, DNS, or LDAP services|
||3||Multiuser server mode|
Consider the dependencies for
$ svcs -d milestone/multi-user STATE STIME FMRI disabled Sep_22 svc:/network/smtp:sendmail online Sep_22 svc:/milestone/name-services:default online Sep_22 svc:/milestone/single-user:default online Sep_22 svc:/system/filesystem/local:default online Sep_22 svc:/network/rpc/bind:default online Sep_22 svc:/milestone/sysconfig:default online Sep_22 svc:/system/utmp:default online Sep_22 svc:/network/inetd:default online Sep_22 svc:/network/nfs/client:default online Sep_22 svc:/system/system-log:default
Milestones are checkpoints in the operating system. Before multiuser mode can be online,
rpc/bind, and the other services listed must be online as well.
One of the dependent services listed is
milestone/single-user, which has its own list of dependencies:
$ svcs -d milestone/single-user STATE STIME FMRI disabled Sep_22 svc:/system/metainit:default online Sep_22 svc:/network/loopback:default online Sep_22 svc:/milestone/network:default online Sep_22 svc:/milestone/devices:default online Sep_22 svc:/system/filesystem/minimal:default online Sep_22 svc:/system/manifest-import:default online Feb_21 svc:/system/identity:node
Instead of making all milestones dependent on common services, the milestones are set up as cascading checkpoints. When you change the dependency list for
milestone/single-user, you don't need to change the dependencies for
To change the milestone level of the host, use the
$ svcadm milestone -d [milestone FMRI]
-d option lets you set your choice as the default milestone. This option will persist across reboots.
As far as shutting down the host, the
init commands are still the preferred methods of performing a safe shutdown or reboot.
Sometimes services fail due to unavoidable circumstances. For example, a bad configuration file will prevent the Apache process from starting. If the service fails, it will usually end up being marked in the maintenance state. To correct this problem, you need to know where to look for problems.
# svcs http STATE STIME FMRI maintenance 20:51:31 svc:/application/http:apache2 # svcs -x http svc:/application/http:apache2 (Apache2 Server) State: maintenance since Mon Feb 20 20:51:31 2006 Reason: Method failed. See: http://sun.com/msg/SMF-8000-8Q See: httpd(8) See: /var/svc/log/application-http:apache2.log Impact: This service is not running.
Each service keeps a log with the output from the method script. Most errors will appear in this file, as long as the program writes out errors to stdout or stderr.
# tail /var/svc/log/application-http\:apache2.log Syntax error on line 23 of /etc/opt/apache2/httpd.conf: Invalid command 'Kisten', perhaps mis-spelled or defined by a module not included in the server configuration [ Feb 20 20:50:30 Method "stop" exited with status 0 ] [ Feb 20 20:51:31 Method or service exit timed out. Killing contract 957 ] [ Feb 20 20:51:31 Rereading configuration. ]
Another option is to check the log of
svc.startd, as it is the restarter process for the Apache service.
# tail /var/svc/log/svc.stard.log Feb 20 20:51:31/3: svc:/application/http:apache2: Method or service exit timed out. Killing contract 957. Feb 20 20:51:31/520: application/http:apache2 failed
After you have corrected the error, use the
svcadm command to clear the maintenance state.
# svcadm clear application/http:apache2 # svcs -x http svc:/application/http:apache2 (Apache2 Server) State: online since Mon Feb 20 21:00:22 2006 See: httpd(8) See: /var/svc/log/application-http:apache2.log Impact: None.
The important thing to remember is that the Service Management Facility isn't designed to block normal access to programs or processes. If you really need to perform serious testing of Apache
httpd or other programs, it's still possible to invoke these commands from the command line. If a service is in the maintenance state, then go ahead and run
http -t, or
sendmail -bD, or whatever command you need to run. SMF will not interfere with processes that did not initiate from its own starter.
Chris Josephes works as a system administrator for Internet Broadcasting.
Return to the Sysadmin DevCenter
Copyright © 2009 O'Reilly Media, Inc.