[Simh] SIMH performance

Mon Mar 5 09:30:57 EST 2012

On Friday, March 02, 2012 at 5:23 PM, Michael Bloom wrote:
> It is known that SIMH uses a lot of CPU time, apparently when it has no
> work to do.
>
> I used strace to capture 10 seconds worth of system calls from SIMH on
> a
> (2 core) machine with 4800 bogomips per core.
> 
> The simulation was pdp11/73 running 2.11BSD

simh has provisions to accommodate this concept with 'idle' support.
On the vax simulator entering the command 'set cpu idle' enables the underlying capability which lets the simulator idle when the simulated system isn't doing anything.  The process of detecting how/when a simulated OS is idling is very OS (and possibly OS version) specific.  The vAX simulator (in the codebase at https://github.com/markpizz/simh/ downloadable as https://github.com/markpizz/simh/zipball/master) has different means to detect the idle condition for different OS environments.  The 'SET CPU IDLE' command take an optional argument (=OSTYPE) which specifies the OS the simulator will be running (VMS is the default).  Other options include: NETBSD, ULTRIX, ULTRIXOLD, OPENBSD, QUASIJARUS, 32V.

What constitutes an OS idling for each of these cases has been determined by empirical observations and examination of OS source code.  Since NetBSD and OpenBSD are still actively developed operating systems, new versions of these OSes are moving targets with regard to providing idle detection.  At this time, recent versions of OpenBSD have veered from the traditional OS idle approach taken in the other BSD derived OSes.  Determining a reasonable idle detection pattern does not seem possible for these versions.

Comments in the pdp11_cpu code suggest that Bob added Idle support.  I'm not a pdp11 user so I don't have direct experience there, but I suspect he probably didn't test with 2.11BSD.  Please suggest what might be appropriate for this operating system....

> In ten seconds, there were 35610 calls into the kernel, averaging about
> 356 per second.
> 
> During those 10 seconds,there were many unsuccessful calls:
> 
> 23526 read calls requesting one byte and returning 0 bytes
>   5990 accepts calls returning EAGAIN,
>   5990 recv calls returning EAGAIN
> -----
> 35606 unsuccessful calls
> 
> and a few successful calls:
>       1 read call that returned 1024 bytes,
>       1 write call returning 4096 bytes
>     100 gettimeofday calls returning 0 (SUCCESS)
>       2 lseek calls returning 0
>   -----
>     104 calls were successful,  less than 3% of all calls

MUCH of this can be avoided if you configure your simulator to ONLY have the devices you actually intend to use.  At the hardware level, the system calls you see will happen for a given simulator setup (i.e. a set of enabled devices) almost without regard to actual use of any of these devices.

I made some changes in pdp11_vh (in the code at https://github.com/markpizz/simh) which will dramatically reduce the host OS overhead without negatively affecting any VH device behaviors (there was a bug which caused the desired I/O polling to happen about 3-4 times every poll interval instead of the needed single poll).

> There is a lot of spinning apparently going on.   What would be the
> downside of issuing select calls or FIONREAD ioctls in the thread(s)
> doing IO?  Or perhaps sleeping until the highest resolution clock
> expires and having file descriptors generate SIGIO if  IO becomes
> possible during the interim?

Until the current codebase in https://github.com/markpizz/simh ALL of simh's I/O is done in the same single thread which also executes the simulated instructions.  There are no 'thread(s) doing IO'.  The new code continues this original design, but also optionally allows disk I/O done to RQ and RP devices and tape I/O done to TQ devices to perform I/O in separate threads.  The networking layer also leverages separate threads to perform reads and writes.

> 
> You could use, for example,  sigsuspend() with a signal mask that says
> to sleep until either a high resolution timer (higher than your desired
> clock resolution)  expires or until a SIGIO indicates that i/o is
> available.  In the latter case, you could use select()  to tell which
> fd
> or fds io is possible on.
> 
> Alternately, you could use the timeout argument of select(), and not
> have to deal with SIGIO at all. select would return either after a
> timeout, or when one of the file descriptors specified is ready.
> 
> Perhaps useful, to provide examples, might be GNU emacs, which has been
> using means such as these  (or whatever method is available on the host
> system) to perform I/O without wasted cycles, since the mid 1980's.
> 
> Note:  by it's very nature, strace slows things down.  Without it,
> there
> would have been more calls, both successful and unsuccessful.
> Schrodinger's Penguin, anyone?

Simh has goals of having all of the simulators and devices which are simulated work on almost any host platform, and to work the same on each of them.  This goal is one of the factors which has driven the single threaded design.  Another factor is the precisely deterministic behavior of what happens when the same thread executes everything.

Adopting a different I/O model for simh means that any device which uses it would have to be rewritten to accommodate this new model.  You may want to read 0readme_AsynchIO.txt in the code at https://github.com/markpizz/simh for some furthur discussion on this subject and how such changes affect simulator and their devices and what has already been done.

- Mark Pizzolato