Beyond My Wildest Expectations: The Butterfly Effect
You never know what reactions a simple action might produce.
On April 25, 2005, I wrote a 'blog entry entitled What's New in Solaris Express 4/05 (Nevada Build 10). As is my custom, I posted a reference to said entry to OSNews, since I believe
that it's the sort of content generally of interest to readers interested in operating systems... news.
A full on flame-war erupted in the comments of the OSNews article, mostly centered around OpenSolaris. Fintan wrote one response (and another). And so did Alan, on his 'blog. Ok, so far, a big yawn. The thread continued on at more or less the same level of hostility. But things get a little strange from here on out. Several days later Renai LeMay, a journalist for ZDNet Australia, produced an article entitled OpenSolaris developers defend their baby, documenting the debate. Next, slashdot stepped in with a story entitled Sun Developers Refute OpenSolaris Vaporware Claims. Taking a further turn for the bizarre, OSNews yesterday posted an article entitled OSNews Troll Succeeds Beyond Wildest Expectations, which rehashes the entire thing! So the butterfly effect is that my one post to OSNews generated 57 comments, which generated a news story (10 comments), at least one 'blog entry (18 comments), a slashdot story (279 comments), and another OSNews story (62 comments as of this writing). And all of that led to me writing this 'blog entry.
Careful reading of the posts also reveals a tendency of the media to inflate the emotions involved in the
story. Alan and Fintan's strongly but carefully worded statements became "furious" and "angry" in subsequent
stories. I wonder what will happen when I post news about Solaris Express 5/05? To be honest, I find it tough
to stay positive under the barrage of negativity coming at us. Well, back to my Saturday-- working on the
web pages for the OpenSolaris.org launch.
(2005-05-07 17:05:01.0)
Permalink
Comments [6]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/beyond_my_wildest_expectations_the
Sun Ray on Solaris 10, and on x64/x86 platforms
Often when talking to customers (for example, in our Beta programs and at conferences like USENIX, LISA, etc.) we've heard a strong, legitimate complaint: no Sun Ray Server on x64 or x86 platforms; no support for Solaris 10. I even witnessed Jonathan get hassled by a customer over this at one point (he correctly pointed out that support was under development).
As a minor digression, Sun Ray is an interesting case study for OS designers, as it returns many of the classic UNIX problems of old to modern hardware-- a Sun Ray server is essentially a large timesharing machine. And lots of groups at Sun-- Solaris, GNOME, Staroffice, and others-- have contributed OS and applications fixes to make Sun Ray servers go faster, and scale to more users. For an excellent anecdote about this, see gtik2_applet2 in the DTrace paper, Dynamic Instrumentation of Production Systems.
Well, we're getting there. I just spotted that we have fresh bits, hot off the factory floor. The Sun Ray Server 3 Update 1 Alpha release is available for download. This brings Sun Ray support for Solaris 10 and Solaris x64/x86 support to the "alpha" (i.e. testing) stage. So for those who want and need this, now is your chance to participate! Pull down the alpha release, and test it out! Let us know if you have problems, so we can fix them!
Other enhancements include (see ThinGuy's 'blog for all the details):
- "Regional" hotdesking
- Administrator authentication improvements and new security settings
- VoIP low-latency optimizations [dp: whoa]
- QoS enhancements [dp: reading the description of how this is done with "zero administration", it's rather brilliant]
- Serial port support on the SunRay 170
- XKB support, improving accessibility
(2005-05-03 01:30:01.0) Permalink Comments [1]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/sunray_on_solaris_10_and
50! Why I like Bloglines
Ok, the obligatory 'blog about 'blogging. Today I checked my bloglines account. If you're feeling a little overwhelmed by the quantity of blogs out there, bloglines can really help to tame the chaos, providing you a centralized place to sift through the new entries in the blogs you read. I like it because, as a web service, I can log into it anywhere in the world, on any machine. It's nice to have stuff centralized. There's also a nice Firefox biff which can notify you of new entries ready for consumption!
One feature I especially like is that bloglines allows you to see how many other bloglines users have subscribed to a particular blog. This of course means that you can watch the stats on your own blog as well. So today I reached fifty bloglines users who are reading my blog; that makes me able to guess that my real readership must be quite a bit higher than that. While I'm dwarfed by some of our blogging giants like Bryan (132) and Eric (114), I'm feeling good about fifty.
An additional cool feature is that a bloglines user can choose to share his or her subscriptions with the world; this means that I even know who some of my 50 subscribers are (surely there is some sort of neat social networking/visualization project to be done here!). So, if you're a bloglines subscriber-- share your subscriptions! To do so, visit the "Account" link, go to "Account Settings" and choose "Yes, Publish my Blog/Blogroll."
In other news, I've posted notification of Solaris Express builds to freshmeat.net.
(2005-04-27 23:30:01.0)
Permalink
Comments [1]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/50_why_i_like_bloglines
What's New in Solaris Express 4/05 (Nevada Build 10)
Solaris Express 4/05 arrived today, based on Nevada build 10. Don't forget to review the Release Notes before installing. Build 11, which
was the intended target had some problems with it. So,
we're going with Build 10, instead, and so the set of changes is a bit smaller
than usual. No large new features were integrated in build 10, but there is
a lot of incremental improvement, and about 200 bug fixes. Assuming things go
according to plan, in May, the Solaris Express release will be based on either
build 12 or 13.
Notable New Features in Solaris "Nevada", Build 10 (04/2005)
Desktop
- The X.org X server is updated to 6.8.2 final release.
- Annoying mozilla bug (the "5.10.1" bug) fixed.
Performance
- Single-threaded standard I/O performance gets a boost; we pick up about 25% on printf(3c) compared to Solaris 10, and about 2x on putchar(3c). This fixes a regression against Solaris 9.
- Sherry contributed some sophisticated work to reduce cache pollution on Opteron systems by employing non-temporal access (i.e. accesses which do not dirty or displace lines in the L2 cache). Before this change, read(2) and write(2) typically involve loading from a source buffer and writing to the target buffer. Both would result in having lines installed in cache. It turns out that very often we don't access the data being written immediately, which means that to facilitate the copy, we end up replacing cachelines that we do need. One of the keys was working out which cases benefit from non-temporal access, and which cases are harmed by it.
Developer Support
- Not-exactly-in-Solaris-10 but very worthy of your attention: For Java developers, a JVMTI/JVMPI provider for DTrace-- you can instrument method entry/exit, object allocation/free, garbage collection, etc. and blend it all in with the rest of your DTrace probes! See Adam's and Bryan's articles about this.
Networking
- TCP keepalive probing period is now tunable via an ndd parameter, tcp_keepalive_interval. And via a socket option, TCP_KEEPALIVE_THRESHOLD. See tcp(7P).
- The S2IO 10-gigabit driver (xge) has been updated; the company is now called Neterion. The update improves performance, and adds some feature enhancements, including Jumbo-frame support.
Other
- SunVTS updated from 6.0 to 6.1
(2005-04-25 04:25:01.0) Permalink Comments [4]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/what_s_new_in_solaris5
OpenSolaris, Solaris 10 at MySQL User's Conference
This week is the MySQL User's Conference.
Stephen Harpster is going to give a talk entitled
OpenSolaris: Innovation Happens Everywhere
on Wednesday at 11:20am. Attend to learn all about the OpenSolaris project!
Following that, and on the heels of last week's BoF at USENIX '05, I'm throwing together a BoF entitled MySQL and Solaris 10. It is on Wednesday at 9:30pm in the Magnolia Room of the Westin Santa Clara.
During the BoF we hope to talk about what makes Solaris 10 a good platform for running MySQL,
and see what else we need to do to improve the OS in order to run MySQL. If you're there, please
join us!
(2005-04-17 22:04:10.0)
Permalink
Comments [1]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/solaris_10_bof_at_mysql
Live, from Anaheim
Tonight I led the Solaris 10 BoF session at USENIX '05.
Bart, Liane, Alan, David Bustos, Matt,
and John Clingan were there to help. We also spotted
Rich, Jim
and other Solaris Luminaries. John said that we had about 80 non-Sun folks in attendance, which I think is pretty good. The BoF ran from 8pm-11pm, and we didn't escape the room until about midnight, when David turned out the lights and forced us all up to the hotel bar, where we stayed until 2am! Thanks for coming, everyone!
I give the crowd an "A" for insightful questions, a willingness to share opinions, and a lot of discussion about Sun, Solaris and OpenSolaris in the marketplace, in academia, and in research. For me, it was interesting to contrast the discussion with the one we had at LISA '04, which was focused on issues like DHCP, LDAP, Jumpstart. I was a little less happy with my own performance-- maybe it was the Claritin, the lack of sleep, the scent of DisneyLand in the air, or whatever, but I was less coherent than I had hoped to be. If my introduction to Zones, DTrace, The details of the CDDL License (see also Andy's blog), or anything else was lacking, check out the aforementioned links, or leave me a comment.
If you're in town for USENIX don't miss Liane's Developer BoF tonight (Wednesday evening)! This isn't a replay of Tuesday's BoF. She'll lead a deeper tour of DTrace, SMF, /proc tools and other developer topics. Ok... time for sleep.
(2005-04-13 03:28:13.0)
Permalink
Comments [1]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/live_from_anaheim
Smacking Super-Smack into Shape for Solaris
A recent article
(and part 2) documented
the author's attempts to benchmark MySQL performance on a
variety of operating systems. He noted
that a popular MySQL benchmark called
Super-Smack doesn't compile on Solaris. I hate when this happens-- inevitably
this tells the reader that Solaris isn't really a serious platform for MySQL (how could it
be, if the benchmark doesn't work?). But from the other benchmarks the author provides, we can see
that Solaris performs respectably; and I suspect that MySQL has itself not received as much tuning
under Solaris as, for example, under Linux. With the help of performance
analysis tools like DTrace, trapstat, etc. we (or you, gentle reader) can fix that.
First, I'd like to clarify one claim in the article: Solaris 10 is bundled with a compiler. That wasn't true in the beta build the author used; but it is true as of the FCS build. So, the benchmark will compile without installing any additional software.
I decided that it was time to get Super-Smack working under Solaris. The first task was to get the program to compile. The configure script ran OK, and I elected to just use the MySQL included in Solaris 10. For more serious benchmarking, I would study this to decide whether to build MySQL myself. So:
$ PATH=/usr/bin:/usr/sbin:/usr/sfw/bin:/usr/ccs/bin $ export PATH $ ./configure --with-mysql \ --with-mysql-lib=/usr/sfw/lib/ \ --with-mysql-include=/usr/sfw/include/mysql \ --prefix=/home/dp/super-smackI then had to make a few edits: src/Makefile needs to link the benchmark with the additional libraries -lsocket -lnsl. A proper autoconf setup should detect this, but... no problem. A few minor edits to C files were also needed:
- Added #include <strings.h> to engines.cc for bzero.
- In query.cc, replaced calls to flock() with calls to fcntl(3c):
- flock(1, LOCK_EX); + fcntl(1, F_SETLK, F_WRLCK);
So now, it built cleanly! Hooray. Next, I muddled my way through getting mysqld started. Once I did, I had to cope with one more problem: Super-Smack, by way of libmysqlclient, seems to want to access the mysql database via a UNIX domain socket at /var/lib/mysql/mysql.sock. However, the database seems to put that socket in /tmp/mysql.sock. I wasn't sure why, and I decided to investigate that discrepancy out later. I hacked things up by putting an appropriate symlink in /var/lib/mysql to work around the problem.
Next, I ran Super-Smack as instructed in the article, and things went somewhat haywire. A quick look revealed that Super-Smack has a fairly conventional design: A parent process forks a bunch of children, which do benchmark activities. When these are finished, they write information back to the parent. I received a variety of error messages, and after applying truss to some abbreviated runs, Jonathan and I decided that the parent super-smack process was exiting prematurely, and failing to collect the data being sent to it by its children. A quick scan of the source code led me to this innocuous looking line of code:
pid_t pid = wait4(-1, 0, 0, NULL);This is where the master super-smack process waits for its children. My brokenness-sense was tingling. That -1 just looks wrong. And, for Solaris, it is. This first argument, -1, is the pid to wait for. In Linux's wait4, this is implemented as follows (excerpted from the linux man page):
< -1 which means to wait for any child process whose process group ID is equal to the absolute value of pid. -1 which means to wait for any child process; this is equivalent to calling wait3. 0 which means to wait for any child process whose process group ID is equal to that of the calling process. > 0 which means to wait for the child whose process ID is equal to the value of pid.So now we know what the author meant: Wait for any child process. While not fully documented (which is a bug), Solaris implements a slightly different ruleset:
< 0 which means to wait for any child process whose process group ID is equal to the absolute value of pid. 0 which means to wait for any child process; this is equivalent to calling wait3. > 0 which means to wait for the child whose process ID is equal to the value of pid.So, on Solaris, wait4(-1, ...) instructs the OS to wait for any child process whose process group ID is 1, while on Linux, it does a wait3(). damn. A final note is that wait4() is not defined by POSIX, the Single Unix Spec, or any standards body I could find. Please, write portable code! One wonders why wait3() wasn't used in the first place. Quickly changing the code fixes the problem.
At this point, I have what appears to be a working Super-Smack on Solaris, and some initial results. I'll intentionally not mention what hardware this was run on, since I've not bothered to perform even rudimentary performance analysis:
$ super-smack /smacks/select-key.smack 10 10000 Query Barrel Report for client smacker1 connect: max=330ms min=6ms avg= 59ms from 10 clients Query_type num_queries max_time min_time q_per_s select_index 200000 0 0 8590.39mpstat(1m) shows that this benchmarks spends a lot of time abusing the system call path, and twiddling bits in userland; it's not clear whether this is really a good test of MySQL performance, since the test client and the database have to fight for CPU resources... All in all, not a terrible night's work. I owe Jonathan a big thanks for his help!
(2005-04-06 11:30:01.0) Permalink Comments [2]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/smacking_super_smack_into_shape
What's New in Solaris Express 3/05 (Nevada Build 9)
I got chastised for not blogging often enough! I'll try to do more in the coming month.
For now, Solaris Express 3/2005 is just around the corner (I believe it will come out
tomorrow, 3/29/2005); here's a rundown on what's new.
Notable New Features in Solaris "Nevada", Build 9 (03/2005)
Desktop
- New "Never Print Banner" option in the Solaris Print Manager.
Hardware support
- MPxIO (Solaris's Multipath I/O feature) compatibility problems with IBM FAStT900 and FAStT600 arrays have been corrected.
- A significant bug, 5042195 "Only part of disk is usable by fdisk or format on Solaris X86" has been fixed.
- CD-ROM/DVD DMA is now always enabled; this had in the past caused problems with some CDROM and DVD drives. However, the performance benefit is significant. CD/DVD DMA can still be switched off via the configuration assistant. This is also known to cause problems with encrypted DVDs; to fix, disable DMA; this should be fixed when snv_11 comes out.
Security
- A new command, embedded_su(1M) allows an application to prompt for credentials and execute commands as the super user or another user using su(1M) as a backend. This makes it easy (easier) to develop non setuid GUIs which invoke privileged actions. Cool!
Performance
- rand_r(3c), rand(3c) and pthread_once(3c) are faster. malloc(3c) and free(3c) are slightly faster.
Developer Support
- The libc atomic_ops(3c) have been expanded. See atomic_cas(3c), atomic_bits(3c), atomic_swap(3c) and membar_ops(3c). These are great for writing tricky code which maintains portability across ISAs. Thanks to Jonathan for pointing out that I forgot to mention this.
- The kernel gets a suite of handy atomic data manipulation routines, similar to those provided by atomic_ops(3c) (including all the new routines highlighted above). See atomic_ops(9F), atomic_bits(9F), etc. (but note a bug in the man pages: kernel code must #include <sys/atomic.h>, not <atomic.h>).
- plockstat(1m) and lockstat(1m) pick up some new options. plockstat
gains "-e
" (limit elapsed tracing time), "-n " (limit entries printed in output), and "-v" (print a message to indicate that tracing has started). Both commands acquire a "-x " option, which enables further tuning by setting various DTrace tunables.
Networking
- The Network Layer 7 Cache (NL7C) revises the kernel NCA (Network Cache and Accelerator) by moving NCA's HTTP layer and object cache into the kernel's socket layer. Previously, NCA provided a completely separate TCP/IP stack inside the kernel, in order to provide the highest possible performance for web servers which were NCA-enabled. With the development of the FireEngine TCP/IP stack in Solaris 10, this extra TCP/IP stack can now be expunged. NL7C also further improves upon NCA performance by providing lower first-byte latency. Prefetch for sendfilev(3ext) is also added. Applications which already use the NCA apis are supported without modification. NL7C provides a framework for accelerating other L7 protocols in the future.
- dhcpsvc.conf(4) and the dhcpmgr(1M) gui have a new "Owner IP" option. This allows you to optionally specify which IP address "owns" the dhcp network records a Solaris DHCP server manages. This is used by the server to determine which dhcp_network(4) records it is allowed to allocate. This feature is especially useful in cases where the DHCP server needs to be moved temporarily to a different system or address, and also in cases where the server may not have a stable IP address.
- TCP and UDP ephemeral port selection is now randomized; this uses a high quality random number source in the kernel, raising the difficulty level of forging a valid RST.
Other
- New "poolbind -e" option allows one to easily run a command, binding that command to the resource pool in question.
(2005-03-28 23:20:01.0) Permalink Comments [2]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/what_s_new_in_solaris4
What's New in Solaris Express 2/05 (Nevada Build 7)
Welcome to Nevada! Nevada is our code-name for the next version of Solaris.
For now at least, the uname -r output is 5.10.1 although
that is subject to change. As I did for the last couple of Solaris 10 SX
builds, I'll attempt to keep you abreast of the changes happening in each
SX release. I missed doing one which described the delta between SX 11/04 (s10_72)
and the FCS build of Solaris 10. The most important of those changes include:
- Inclusion of gcc for SPARC, x86 and AMD64 (/usr/sfw/bin/gcc).
- Intel 10GB NIC driver (ixgb driver)
- svcadm (part of SMF) picked up a synchronous mode (via -s)
- BIND 9 became the default name server. BIND 8 was removed.
- A large fraction of binaries delivered (including kernel modules) are now cryptographically signed. See elfsign(1).
Desktop
- Updated Xorg from 6.8.0 to 6.8.2RC2, including numerous bug fixes and new hardware support (see the X.org release notes). The final version of 6.8.2 will be available in a future Solaris Express build.
- An annoying bug in the /usr/sfw/bin/mozilla prevents it from starting up properly. Edit the OS_VERSION check in the script to work around the problem.
- You can now double-click .jnlp (java web-start) and .jar files to run them under GNOME.
Hardware support
- via823x SADA audio driver on x86 and AMD64 platforms.
- Chelsio 10gb NIC driver available on all platforms (SPARC, x86, AMD64).
Security
- 64-bit openssl(1) command available. Solaris already ships with a 64-bit openssl library. The openssl command provides a tool for using various cryptography functions of OpenSSL's crypto library from the shell.
- Support for a PKCS#11 "MetaSlot". This is an extension to the Cryptographic framework which presents a single slot which is the union of the capabilities of other slots which are loaded in the framework.
- IKE gets a performance boost by using the encryption framework. IKE is also now fully compliant with RFC 3947 (NAT-T support).
Storage
- iSCSI devices are now supported via the new iscsiadm(1m) command.
- The fcinfo(1m) utility is now available; this utility can be used to list fibre channel ports on the system in a concise and clear fashion.
Performance
- Hierarchical (Multi-level) Lgroup support. Solaris has an abstraction called an Lgroup (latency group) which is the way in which the system tracks NUMA system topology. Traditionally, Solaris has run on systems with no difference in latency (traditional SMP systems) or only two levels of latency (local memory and remote memory). Newer system designs have more levels. For example, 4-CPU Opteron systems have 3 such levels; 8-way Opteron systems may have 4 levels. This project enables better performance on ring and ladder system topologies, and picks up performance wins on Oracle (TPC-SO), Fluent, and other benchmarks. There are some new liblgrp APIs to go along with this work (lgrp_latency_cookie(3LGRP)).
- Faster memmove(3c) (anywhere from 0-400%, 40% is typical) on 32-bit x86 platforms. AMD64 performance of memmove(3c) and bcopy(3c) were also improved.
- Improved context switch performance on AMD64.
- Much improved performance on 32-bit x86 string functions: strcpy(3c) (as much as 50%), strlen(3c) (as much as 25% on long strings) and strchr(3c) (as much as 45% on long strings).
- New TCP_INIT_CWND TCP socket option allows the congestion control window calculation to be overridden with a user specified value. See tcp(7p) for full details.
(2005-03-01 21:20:00.0) Permalink Comments [6]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/what_s_new_in_solaris3
Squid startup: Extreme Makeover with SMF
I run a web proxy server for folks in the office; we use it as a longterm testbed for Solaris.
But, in the insanity leading up to the release of Solaris 10, I've had little time to work
on it. Recently I got a new server to host the cache; and so I've been busy putting it together.
We've always used Squid as our proxy software.
Personally, I have some qualms about Squid's design, but with five years of experience using it,
I think we'll probably stick with it for now. It's a curious thing that there doesn't appear
to be a significant competitive open source alternative to Squid (the forthcoming Apache 2.1 is
moving mod_cache out of "experimental" support so perhaps that will be worth considering?).
Setting aside design complaints, being able to effectively administer Squid is a big priority, so recently I worked on getting it properly under the control of the Service Management Facility (SMF). It's also a good example of how to improve a program's administrative controls with SMF.
The first task was to look through Squid's existing start/stop/restart capabilities. There's a RunCache script, which I had always thought was the supported way to start the daemon. Looking at the documentation, RunCache is now aparently obsolete, but still installed along with squid anyway (sigh). RunCache has many problems which I won't detail here.
In the same neighborhood, there is the squid binary, which has a number of relevant command line options:
Usage: squid [-dhsvzCDFNRVYX] [-f config-file] [-[au] port] [-k signal] ... -f file Use given config-file instead of /aux0/squid/etc/squid.conf ... -k reconfigure|rotate|shutdown|interrupt|kill|debug|check|parse Parse configuration file, then send signal to running copy (except -k parse) and exit. -s Enable logging to syslog. ... -z Create swap directories ... -N No daemon mode. ...To add to the complexity, squid has its own restarter directly built into itself. This is somewhat suboptimal, as SMF tends to trump these facilities, and allows monitoring software to have visibility into restart events. Anyway, we can make use of the -k option to control the daemon to some degree, and give the administrator the power to create multiple service instances if we use the -f option. In my testing, I found the -k reconfigure option to be somewhat useless, so I decided not to implement an SMF 'refresh' method. Perhaps I missed something?
Another problem we'd like to solve is that Squid doesn't operate properly "out of the box." First, one must run the daemon with the -z option in order to create the cache metadata. I'm not sure why the squid team made this decision; I certainly don't think it's a good one. Our startup scripting can simply take care of cache creation for the administrator. After working out the right set of dependencies for the cache as I'd set it up (./configure --disable-internal-dns --enable-ssl --prefix=/aux0/squid --enable-storeio='ufs aufs'), I prepared a service manifest file which captured those dependencies; the dependencies look like this:
$ svcs -d squid STATE STIME FMRI online Jan_26 svc:/milestone/network:default online Jan_26 svc:/system/filesystem/local:default online Jan_26 svc:/network/dns/client:default online Jan_26 svc:/milestone/sysconfig:defaultThe network milestone is the stable way to depend on "networking being up on the box." A buglet in some of the S10 FCS manifests (notably, Apache) is that some of them have finer grained, and less stable dependencies (for example, on network/physical). When stable dependencies in the form of milestones are available, please use them.
Note that the default mode for squid is to use it's own internal DNS library (ugh), so you may or may not need the DNS dependency. This is (double ugh) a compile time setting. Regardless, you'll want to have an /etc/resolv.conf file present, and the network/dns/client manifest checks for that.
Next, I worked on revising the startup script to be much more intelligent. To start up the cache, it uses squid's -k parse option to decide whether the configuration file has a valid syntax. If not, it exits with the $SMF_ERR_CONFIG error code, which indicates a configuration problem. Next, it populates the cache directory using squid -z as needed. Finally, it starts up the cache. Every failure logs a clear and detailed log message.
I also added a couple of service properties, which the script uses to set its behavior. Ideally, this will be automatically and correctly generated from the configure script in the future. Just tweak the manifest before importing it. In the example manifest, squid has been configured to be installed into /aux0/squid. You will want to search the file and alter all of the places which reference /aux0/squid, adjusting them for your installation (you can also use svccfg after you import the manifest to make corrections). Here is a draft of the network/http-proxy:squid service manifest; and a draft of the svc-squid startup script. To install:
- Tweak squid.xml to reflect the Squid installation directory.
- Copy the svc-squid script to the location reflected by squid.xml.
- svccfg import /path/to/squid.xml
- svcadm enable squid
[Sigh. Sometimes I feel like I'm just too slow to post.
Since I started this post a month ago,
some of the work Trevor posted obviates mine. While I'm not happy about having multiple
similar solutions to a single problem I think this represents a substantial improvement, and
it did take quite a while to refine into the current state. It has also been checked
and nitpicked by the SMF team, so I'm optimistic that it is roughly correct. One interesting
result is that my dependencies are different than the set which Trevor worked out.
Determining the right set of dependencies is, at present, a bit of a black art.]
(2005-03-01 04:45:01.0)
Permalink
Comments [3]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/squid_startup_extreme_makeover_with
I've been in Singapore for nearly a week, since leaving Tokyo. Sadly, I've spent much of the time sick, with
a moderate cold. I had thought I was on the mend, but today I mostly lost my voice, just in time for my
training presentations! Sigh...
Singapore is a pretty amazing mix of cultures. Before I got sick, I managed to get to the Lunar New Year's celebration (as you can see, it's the year of the rooster). There are some more pictures here. |
(2005-02-24 00:25:00.0) Permalink Comments [0]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/singapore_redux
Jonathan and I have been spending the week in Tokyo, doing a series of training presentations and customers visits on various Solaris 10 topics. The culture shock is moderately intense, but it helps that everyone is friendly, the city is immaculately clean, and that there is always something new right around the next corner. We've had some outstanding food, and had a couple of days to do the usual touristy stuff. I posted some pictures of Tokyo. Tomorrow we will present to 250 customers, and then on Friday I will go on to Singapore... |
(2005-02-15 19:52:28.0) Permalink Comments [0]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/lost_in_shinjuku
Remote, Secure Zone Console Login
I have heard from a number of customers that folks would like remote login to zone consoles. In particular, they would rather not give out logins to the global zone in order to allow zone logins. (Really: I don't spend all of my time on the zones console...).
Fortunately, we can handle this in a nice way already. (Disclaimer: Please note that as stated by the script, the following techniques have not been subject to a rigorous security audit. I believe this technique to be sound, but neither I nor Sun warrant it to be so.)
To start, we'll
add a user account to /etc/passwd for each zone we want to set up this way:
z1:x:999999:999999:xanadu-z1:/tmp:/opt/extras/zoneshell
^D
# pwconv
# passwd z1
New Password: xxxyyy
Re-enter new Password: xxxyyy
passwd: password successfully changed for z1
The zoneshell script is here; the script itself is very simple: it looks up the entry in /etc/passwd and executes zlogin -C for the zone named in the GECOS field.
Finally, we need to give the z1 account the ability to run zlogin; we do that by modifying
the RBAC attributes for the z1 user.
z1::::profiles=Zone Management
^D
So, here's what it looks like:
Password:xxxyyyy
Last login: Tue Jan 25 13:54:01 2005 from xxx
warning: using experimental, unsupported 'zoneshell'
[Connected to zone 'xanadu-z1' console]
I'd appreciate any feedback on whether this is helpful, or not!
To reiterate: this code is experimental, and has not been audited for its security characteristics. Use of this script is AT YOUR OWN RISK. Please use this as an example, from which you could derive your own implementation.
(2005-01-26 19:00:00.0)
Permalink
Comments [3]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/remote_zone_console_login
Clearing up confusion about zlogin, zones, consoles, and terminal types
Thanks to bloglines' nice search feed feature, I found
this thread on the Solaris x86 Yahoo group.
Phillip asks why, when he issues a zlogin -C to a zone, it asks him which terminal type he'd like to use. For those who might not have seen zlogin before, it's a tool patterned on the syntax of rlogin and ssh; one uses it to enter a zone from the global zone. The "regular" way one would use it is as follows:
$ zlogin myzoneThis will insert you into the zone with an appropriate subset (including $TERM) of your environment propagated from the global zone. So, if you are using an xterm and your $TERM is "xterm", then that will be propagated correctly into the zone. This is all implemented using pseudo-terminals (the same things used to make telnet, ssh, etc. do what they do); they are pretty easy to deal with-- when you need one, you create it from nothing, then start some processes which are connected to it in some fashion. You have full control of the process environment. In this mode, zlogin will never ask you what terminal type you have; if $TERM is unset in your global zone shell, it will either be unset, or default to something like dumb inside the zone, depending upon your shell.
Zones also possess a virtual console, which can be accessed using the zlogin -C command. And this is where Phillip is having problems. A console is fundamentally different from a pseudo-terminal. While a pseudo-terminal vanishes once you stop using it, a console (real or virtual) keeps its state; you can connect to and disconnect from it at any time. Users familiar with using the tip(1) command or other serial console systems know that they must often tweak some settings after attaching to a console. Think of the console as television-- the programs are always playing, regardless of whether the set is on or not; you can choose to watch the set or not by turning it on (i.e. connecting to the console).
Phillip recalls having already answered that question when he installed the system as a whole. In a subsequent post he is more critical, since it isn't intuitive why we ask for this information again. Since this is my work, hopefully I can show why this isn't "sloppy" as Phillip asserts, but rather an unavoidable artifact of the way UNIX consoles function.
To understand this, we need to turn to another important distinction: the terminal type of the system's console should usually be set to reflect the kind of hardware which comprises the physical system console. On Sun's SPARC boxes, this is sun and on x86 we have sun-color. This is important, because these terminal types are pretty much incompatible with, for example, the xterm terminal type.
On the other hand, if a machine's console is instead set to be one of its serial ports, and is accessed over a tip line, then the default terminal type is usually set to something fairly benign like xterms (xterm-small) or vt100 or the like-- but this setting must be made by the administrator because there is no protocol for serially connected terminals to identify their terminal type, a limitation of the hardware.
Zones emulates the latter sort of connection-- a zone console is analogous to a serially connected tip line. At one end is your terminal, the type of which is not automatically known to the console at the other end. It is probably an xterm (xterms), a gnome-terminal, a dtterm, or the like. It might also be a vt220, a wyse or any of hundreds of others. So, just as we do at first system boot (if we can't work out what type of terminal we are connected to by querying the openprom device), we query the user the first time the zone is booted. After that, we'll remember this setting. I suppose that we could have just defaulted to something such as 'vt100' but that also seems unfriendly; the sysid tools (the stuff that asks you for your hostname, timezone, etc.) make extensive use of curses, which tends to spam your terminal with garbage if it's idea of your terminal type doesn't match your terminal hardware (or emulator). We certainly can't default to the system's default setting, since that is highly unlikely to be compatible with your window system terminals; if the zone operates as though the terminal is sun and you are using an xterm, you won't be pleased.
It's also worth mentioning that you can automate away all of these first-zone-boot questions by employing an /etc/sysidcfg file.
Next up-- how do you change the terminal type if you've made a mistake during the sysid configuration? You'll know this happened if your screen is filled with gobbledygook characters when you'd normally see the "what is your hostname?" question. It's nice that you have the non-console zlogin available when you encounter situations like this. To repair things, log off of the zone console, and run:
# zlogin myzone /usr/sbin/sys-unconfigThis will halt the zone after blanking it. Boot the zone back up, log onto the console, and start again.
[Update: It's not the case that after you specify the terminal type for the sysid tools, this will automatically become the console terminal type; arguably, this would be good, but it also doesn't match the behavior of earlier Solaris releases. We'll take a look. See the tip below for how to set your console's terminal type]
What if you changed your mind about what the default terminal type of the console ought to be? The classic "big hammer" method is to simply run the sys-unconfig utility; this has the downside of pretty much blanking your system's networking configuration, but it is effective.
In older versions of Solaris, you can also edit the ttymon line in /etc/inittab. Starting in recent builds of Solaris 10, all of this is controlled by SMF, the new Service Management Facility; as a result, changing the terminal is pretty simple. To check your current setting:
$ svcprop -p ttymon/terminal_type system/console-login sunTo see all of the ttymon family of properties, issue:
$ svcprop -p ttymon system/console-login ttymon/device astring /dev/console ttymon/label astring console ttymon/modules astring ldterm,ttcompat ttymon/nohangup boolean true ttymon/prompt astring \`uname\ -n\`\ console\ login: ttymon/timeout count 0 ttymon/terminal_type astring sunAnd to change your console terminal type (as root):
# svccfg -s system/console-login 'setprop ttymon/terminal_type = sun'So now we have a supported, upgrade-safe way to change all of the elements of the console's configuration. Sweet!
(2005-01-23 17:30:00.0) Permalink Comments [3]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/clearing_up_confusion_about_zlogin
More on Bootchart for Solaris
So Eric Let the cat out of the bag
on our reworked BootChart for Solaris. Eric
posted one image; here is another: the bootup
of a single Solaris Zone. I'm pretty happy with this, as we boot in only about 7 seconds.
This was an interesting experience because Eric and I had not previously worked together very closely. I had a great time doing this, and because Eric immediately stuck everything into a Teamware workspace, we were able to work simultaneously. Eric and I both worked on the D scripting, and somehow a joke about "wouldn't it be nice if the script output XML?" turned into our strategy. This turned out to be a good decision, as I hate writing parsers; instead we just let SAX do the work. We were able to maintain the split of having boot-log generation in one self-contained component, and the log processing into another. Because the XML logs are in an easily parsed format (as opposed to parsing the output of top and iostat), they can be useful to anyone doing boot analysis on Solaris. We've already had some such requests. I'm sure Eric will have more to say about the implementation so I'll leave it to him, except to say that some of the visual design changes can be blamed on me, taking inspiration from Edward Tufte's work.
Something else which fell out of this experience is that it's easy to use the log gatherer on any collection of processes which start up (as we saw in the zones example, above). We hope that this will be helpful in highlighting performance wins in the startup of complex ISV programs.
Following the experience of Linux developers, we've also found a series of bugs in Solaris with this tool. Let's start with an easy one, and the first one I found. The bug is visible in this chart from xanadu, my Shuttle SN45G4. Because we don't have support (sadly) for the on-board ethernet on this box, I had inserted another network card (I won't name the vendor, as I don't want to put them on the spot). If you look carefully at this bootchart, you'll see that the ifconfig process is spending a lot of time on the CPU. What's up with that? A brief investigation with DTrace made it clear that the driver had code like the following (shown here reduced as pseudo-code) in the attach(9E) codepath (attach is the process by which a driver begins controlling a hardware device):
for (i := 0 to auto_negotiation_timeout) { if auto_negotiation_complete() return success; wait_milliseconds(100); }Which all looks fine except that wait_milliseconds() (a function defined by the driver) is a wrapper around drv_usecwait(9F) ("busy-wait for specified interval"). Busy is of course the problem. drv_usecwait is really more about waiting for short intervals for various information to become ready in various hardware registers. Busy-waiting 100 milliseconds at a time is practically forever, and ties up the CPU just spinning in a loop. The authors almost certainly meant to use delay(9F). I filed a bug, and hopefully we'll have it fixed soon (since this driver comes to us from a third party, they request that we let them make the code changes). Fun, eh?
Another two issues we spotted concern inetd, which has been rewritten from scratch in Solaris 10; it is now a delegated restarter, which basically means that it take some of its direction from the system's master restarter (svc.startd). The behavior we noticed sticks out on any of the boot charts, including the zone boot chart mentioned above: inetd is starting a lot of very short-lived ksh processes. Why? When I first spotted this, I used DTrace to work out the answer, as follows:
# dtrace -n 'proc:::create/execname=="inetd"/{ustack();}' ... restart inetd ... 0 12188 cfork:create libc.so.1`__fork1+0x7 libc.so.1`wordexp+0x16f inetd`create_method_info+0x45 inetd`create_method_infos+0x2f inetd`read_instance_cfg+0xc8 inetd`process_restarter_event+0x171 inetd`event_loop+0xfd inetd`start_method+0x91 inetd`main+0xcb inetd`0x8054712Ahh, so we stumble upon an (embarrassing) implementation artifact of libc-- it uses ksh to help it implement the libc routine wordexp(3c). So, every time inetd needs to wordexp() something, we wind up running a ksh. We can also see that this is not severely impacting performance, but we would like to get this fixed. Personally, I'd like to see wordexp() fixed to not rely upon ksh at all.
Another somewhat more subtle issue is something that SMF engineers like Liane are still looking at. It's also visible in the zone boot chart. It appears that some services (such as nfs/client) are delayed in coming on-line because it is taking the startd/inetd cabal a while to mark their dependent services as online, even though that shouldn't really entail much work. We can see this as follows:
$ svcs -l nfs/client fmri svc:/network/nfs/client:default name NFS client service ... dependency optional_all/none svc:/network/rpc/keyserv (online)Since nfs/client depends upon GSS, it can't be started by startd until GSS is online. Liane and Jonathan have offered up some theories about why this is happening, but we've all been engaged on higher priority work, and so we haven't had much time yet to dig deeper. This serialization point appears to be costing us roughly 1 full second on boot, so it's something we need to look at further. Have a great weekend folks!dependency optional_all/none svc:/network/rpc/gss (online)dependency require_all/refresh svc:/milestone/name-services (online) $ svcs -l network/rpc/gss fmri svc:/network/rpc/gss:default name Generic Security Service enabled true state online next_state none state_time Fri Jan 07 18:22:32 2005restarter svc:/network/inetd:defaultdependency require_all/restart svc:/network/rpc/bind (online) dependency optional_all/none svc:/network/rpc/keyserv (online)
(2005-01-07 19:00:00.0)
Permalink
Comments [8]
Trackback: http://blogs.sun.com/roller/trackback/dp/Weblog/more_on_bootchart_for_solaris