network driver bug?

Title: network driver bug?
Post by: Sound Consulting on May 09, 2006, 03:01:23 AM

I purchased a 110 MHz SPARCstation 5 many years ago and installed NEXTSTEP (3 or 4). Things went great for quite a while, and it served as my sendmail host. Unfortunately, I began to see a problem where the machine would disappear from the network. If I was already logged in at the machine, I could determine that there were errors indicating that no more memory could be allocated. This seemed cyclic in nature, approximately once a month. Rebooting without power-cycling would always fix the issue for another month. However, as the amount of spam flowing through my server increased, the time between network bugs shortened. It got to the point where I wanted to back up data, and I noticed that I could not even transfer a large file over the network without triggering the network bug. From a fresh reboot, there was clearly a finite number of packets that could flow in or out of the 10baseT card before it stopped working.

Has anyone heard of this? When I spoke to NeXT/Apple before installing this, they said I was one of only a handful of people who were running the full NEXTSTEP instead of just installing OpenStep on Solaris, where there were thousands of users. Unfortunately, by the time I actually experienced these problems, any kind of support at Apple was long gone.

I see on this forum that people recommend the 100baseT option card. That would be great here, since I upgraded to a faster hub when the new Mac OS X Servers came out with 100baseT (and eventually gigabit). But I worry that the problem would just happen faster.

Any ideas? Seems suspiciously like a memory leak in kernel memory, and a network driver would be one of the few things that needed memory from a potentially limited pool. But it shouldn't be broken like this. Anyway, I will welcome any comments or discussion. It might be fun to turn this machine on again and do something useful with it.

Title: I know this doesnt prove much of anything...
Post by: neozeed on May 09, 2006, 09:05:11 PM

but with buggy hardware/drivers on the x86 ive eperenced the same sort of thing.... Although Id be interested if you were on a managed switch? Also if

netstat -ni

produced any interisting info, as far as errors? serious stuff gets written to /usr/adm/messagess

Title: Re: I know this doesnt prove much of anything...
Post by: Sound Consulting on May 12, 2006, 01:26:22 AM

Quote from: "neozeed"but with buggy hardware/drivers on the x86 ive eperenced the same sort of thing.... Although Id be interested if you were on a managed switch?

I'm not sure what a managed switch is. I started with a 10base-T hub, then upgraded to/added a 10/100 hub/switch. You know, the kind where there is a 10 half and a 100 half and a switch between the two halves. If I understand the technology.

The 10base-T hub had 8 10base-T ports and 1 10base-2 that I used for the NeXT black hardware. I eventually had some problems there, but I think they're still on the 10base-2, so I just don't remember. The 10base-T/2 hub still lives attached to a port on the 10/100, so I guess it never really died.

QuoteAlso if

netstat -ni

produced any interisting info, as far as errors? serious stuff gets written to /usr/adm/messagess

I'm quite familiar with /usr/adm/messages, but never saw many clues.
Once I theorized that it was a memory leak and related to the number of packets, I would log the total packet counts (possibly using netstat, if I recall) and see how long it would take before it would hang. I think I even set up a cron job to log this info, so I could check it on the next reboot.

After I gave up on the SPARC and was simply running the machine long enough to copy data off to other machines, I noticed that there was a very predictable number of packets before wedging. So I just broke up my tar files into small enough chunks that I could copy one off, reboot, copy the next, reboot, etc.

Now that I think about it, /usr/adm/messages is probably where I was seeing the "out of memory" or "unabled to allocate memory" errors. I should just start that old thing and see what's in the file. ... I'll do that next time I'm the server room (museum).

Title: managed switches
Post by: neozeed on May 12, 2006, 05:56:04 PM

are ones you can talk to to read the errors on the other side, and force speed/duplex. I have had tonnes of issues with sparcs and cisco switches screwing stuff up. The funniest part is that for some the best way to perform is to set them up incorrectly... ie 100/half on switch, and 100/full on sparc.. lots of errors but way faster than doing it 'correctly'.

As for its errors, its hard to say.. You probably were one in like 30 people using NS/sparc...

Too bad it doesnt run on ultras... :|

network driver bug?

NeXT Computer, Inc. -> Sun / HPPA Hardware

Go to top Forum index