Indeed, that's a very promising thread. Seems a shame he wasn't able to sign up here!
Quote from: barcher174 on April 29, 2025, 10:04:09 PMGood work going on here:
https://68kmla.org/bb/index.php?threads/nextstation-clock-doubler.49799/
Looks almost complete.
This is great news. I created an accelerator as well so we're about to have two. Exciting times.
Quote from: gtnicol on April 30, 2025, 01:52:33 AMIndeed, that's a very promising thread. Seems a shame he wasn't able to sign up here!
One of our forum members talked to him about emailing me. I can get him signed up.
Fascinating stuff!
I guess a potential issue with making something for the cube is space constraints, due to the depth available to the motherboard against the psu housing. And the issue of certain 040 motherboards having the cpu soldered in place rather than being socketted.
I do have a spare L88M 040 waiting for just such an upgrade 8)
Quote from: Nitro on April 30, 2025, 02:22:18 AMThis is great news. I created an accelerator as well so we're about to have two. Exciting times.
One of our forum members talked to him about emailing me. I can get him signed up.
Now I'm curious about your one too!
Greetings. That's my work :)
I've still got a DMA issue to sort out but other than that it's working well. I've got a tray of QFP 68040s in hand, depending on what I need to do to figure out the last problem a board spin might be required. (may be problematic due to the current tariff situation)
DMA seems to work correctly at least for disk IO, ethernet TX, but ethernet RX is lossy and sound likewise acts similarly. Video is also fine (though I'm not convinced it actually uses DMA for much), haven't tested serial, sound in, floppy, dsp or printer DMA channels yet. I think it's an edge case on bus arbitration but I haven't managed to pin down what exactly is wrong yet.
The plan would be to use a low profile HSF to avoid clearance issues and guarantee cooling, but I have *no* idea what the inside of a cube looks like and from one pic I saw it may require a specific design.
Quote from: Nitro on April 30, 2025, 02:22:18 AMThis is great news. I created an accelerator as well so we're about to have two. Exciting times.
Would be curious to hear more. I assume it is also a clock doubler type. Reverse engineered, or a new design?
Quote from: zigzagjoe on April 30, 2025, 09:50:45 AM
Nice, How much does something like that cost? I remember reading on Amiga forums they had to be careful of buying fake 060s and 040s so they used to recommend trusted sellers.
Quote from: pTeK on April 30, 2025, 02:17:27 PMNice, How much does something like that cost? I remember reading on Amiga forums they had to be careful of buying fake 060s and 040s so they used to recommend trusted sellers.
Yes, it is very difficult to source vintage 040s, 060s, and 68882 FPUs without running afoul of remarked parts. I work with Eric Woo in china to get non-remarked chips that meet my fairly particular requirements. I'm more known for making Macintosh accelerators and video cards and all of the CPUs and FPUs on my accelerators came from Eric.
These full 040 (not EC, LC, or V-suffix) QFP chips in particular are hard to price as they have to be salvaged from boards. Despite being made through 2016 or so they were apparently very uncommon parts. In fact, I haven't even been able to find a datasheet from Motorola that even acknowledged full 040s were even made in the QFP package. The total supply that Eric was able to find for me in the last year or so wasn't even 100 chips total, and there's no chance of finding (authentic) QFP 040s on the open market in my experience.
That said there's no
requirement to use QFP chips, I just find them easier to route and solder, and it's easier to get late-mask chips that can do 50mhz.
Made some headway on the DMA issue. As I thought, it's a bus arbitration issue: the arbiter needs to recognize the deasserted state of BB for at least a single LB BCLK cycle before it's able to register it as being asserted again. Otherwise, arbitration hangs as it thinks the bus is still busy. Still trying to figure the exact timings it wants, but with a temporary fix in place it corrected the sound and ethernet RX issue I'd been having.
Quote from: zigzagjoe on April 30, 2025, 09:50:45 AMWould be curious to hear more. I assume it is also a clock doubler type. Reverse engineered, or a new design?
Welcome to the forums zigzagjoe! Looking forward to reading about your progress. I created a new thread in the Overclocking forum.
Quote from: zigzagjoe on April 30, 2025, 09:50:45 AMGreetings. That's my work :)
I've still got a DMA issue to sort out but other than that it's working well. I've got a tray of QFP 68040s in hand, depending on what I need to do to figure out the last problem a board spin might be required. (may be problematic due to the current tariff situation)
DMA seems to work correctly at least for disk IO, ethernet TX, but ethernet RX is lossy and sound likewise acts similarly. Video is also fine (though I'm not convinced it actually uses DMA for much), haven't tested serial, sound in, floppy, dsp or printer DMA channels yet. I think it's an edge case on bus arbitration but I haven't managed to pin down what exactly is wrong yet.
The plan would be to use a low profile HSF to avoid clearance issues and guarantee cooling, but I have *no* idea what the inside of a cube looks like and from one pic I saw it may require a specific design.
Would be curious to hear more. I assume it is also a clock doubler type. Reverse engineered, or a new design?
Any chance of adding a bit of cache to this? If you do this, add as much as possible.
Would an 040 external cache be on similar lines to this?
https://www.seas.ucla.edu/spapl/bps/previous/cache.htmlYou don't seem to see external cache on Amiga 040 accelerators. The newer ones such as the BFG9040 seem to rely on much quicker ram access compared to the speed of ram on the legacy boards (like the WarpEngine 3040 or CyberStorm 040)
I've got a handle on the logic issue I was wrestling with, still need to verify that I've got it nailed down. The speed improvement is really quite nice under OpenStep 4.2. Built a second accelerator for the "control" case, and will make one more so I have a good sample size to test.
Current task is to finish logic verification and stability testing of a few boards on my turbo color slab. Then it will be time to test in cubes and non-turbo machines. Aside from the couple of bodge wires required, it's looking like the current boards will get the job done for an initial batch. Down the road I'd love to get some black PCBs made, but I don't enjoy populating passives by hand so this would depend heavily on how the tariff situation plays out.
Turbo machines will require the speed jumper installed in order to set the system bus to 25mhz. While it would technically be possible to make a design to run the bus at 33mhz while running the CPU from an uncoupled clock (say 50 mhz), this practically wouldn't bring much of an advantage due to the complications required to bidirectionally synchronize signals. The minor bandwidth increase would go right out the window.
Otherwise,
so long as a CPU socket is present, you can just drop my accelerator in and go. Most of the fun parts are hiding on the bottom of the PCB, except for the CPU and CPLD. This basic design will physically work in both types of slabs. The non-turbo nextcube will require a specific design with different CPU rotation to work. Unfortunate, but unsurprising.
I haven't found a good picture of a cube turbo-chipset logic board with enough detail in order to understand the rotation of the CPU socket. A picture of beneath the board would work, possibly - as long as there's the missing pin in the CPU socket shown on the bottom. Otherwise, nudging the heatsink around enough to see some of the markings would do the trick too.
Cache: Down the road, perhaps; it's not an immediate goal for the first revision. The 68040 doesn't need external cache nearly as badly as the 68030 does due to the
far more efficient (and larger) internal caches. On a Mac with a 40mhz bclk and a similar interleaved memory subsystem the improvement is perhaps 10% on average, though this gap would increase when the bus clock is half the speed of the CPU clock.
I quite like the 040 bus as it's noticeably simpler to deal with than the 030 bus I'm used to, and I am curious what cache would bring to the table. As I have a few reference implementations to consult, it's a possibility someday. However, it would incur a large increase in complexity and cost.
Here are some preliminary performance numbers.
That's really impressive, zigzagjoe! I think
@spitfire is mentioning cache because it's one of the secret ingredients of the Nitro accelerator (
https://web.archive.org/web/20100624095845/http://www.channelu.com/NeXT/Black/Nitro/index.html). Despite only using a 68040 @ 33 MHz, the Nitro's 128 KB of 4-way associative cache allowed it to beat the 50 MHz Pyro on compilation tests, taking 367.4 seconds vs. the Pyro's 495.9. You may want to run a compilation benchmark on your accelerator vs. your NeXTstation in stock configuration so we can approximate that same performance metric for comparison.
It appears that this (
https://datasheet4u.com/datasheet/Cypress-Semiconductor/CY7B180-519596) is the cache chip on the Nitro, the CY7B180-12JC on the back of the board. There are definitely still supplies for it that exist. Though I'm seeing some other sites describe (mislabel?) that part number as being a programmable logic chip.
*cries with my soldered-CPU cube*
Quote from: Rhetorica on May 02, 2025, 03:19:32 PMThat's really impressive, zigzagjoe! I think @spitfire is mentioning cache because it's one of the secret ingredients of the Nitro accelerator (https://web.archive.org/web/20100624095845/http://www.channelu.com/NeXT/Black/Nitro/index.html). Despite only using a 68040 @ 33 MHz, the Nitro's 128 KB of 4-way associative cache allowed it to beat the 50 MHz Pyro on compilation tests, taking 367.4 seconds vs. the Pyro's 495.9. You may want to run a compilation benchmark on your accelerator vs. your NeXTstation in stock configuration so we can approximate that same performance metric for comparison.
It appears that this (https://datasheet4u.com/datasheet/Cypress-Semiconductor/CY7B180-519596) is the cache chip on the Nitro, the CY7B180-12JC on the back of the board. There are definitely still supplies for it that exist. Though I'm seeing some other sites describe (mislabel?) that part number as being a programmable logic chip.
Yup. That's more or less my thinking. But if you're putting cache on, put as much as you can reasonably wire up.
I do want one of these when they're available.
I don't currently have a toolchain installed, so I can't immediately do a compile benchmarks. I think I've got povray working however so that'll be worth a shot as a point of comparison. Practically I expect performance to be identical to the Pyro accelerator unless they spectacularly screwed something up in their implementation.
A 128kb direct mapped cache would be the most likely possibility. An associative cache would be difficult to do quickly enough without requiring waitstates, and it would be much more complex than an already-complex direct mapped implementation.
Any cache would require TAG srams, and they're thoroughly obsolete. The macintosh accelerators I make require TAGs too and they're always the most annoying part to source after the CPUs.
We'll see where things end up!
Quote from: trixster on May 02, 2025, 12:29:33 PMWould an 040 external cache be on similar lines to this?
https://www.seas.ucla.edu/spapl/bps/previous/cache.html
You don't seem to see external cache on Amiga 040 accelerators. The newer ones such as the BFG9040 seem to rely on much quicker ram access compared to the speed of ram on the legacy boards (like the WarpEngine 3040 or CyberStorm 040)
There's no specifics on his site so it's hard to say. At face value however I expect he was running a slow clock (25mhz or so) as none of the 68040 cache implementations I've looked at support no waitstates / bus retries at higher speeds. The two 40mhz rated ones I looked at used two wait states. This means any RAM read that isn't explicitly cache inhibited will
always incur a few cycles of delay while the cache is checked for a hit. At that point the read would allowed to generate an 'external' bus cycle on a miss or the cache data SRAM begins supplying data on a hit. This turns into a math problem: do the cache hits end up save more time than the delay cycles cost.
Past a certain point as you say it's simpler to just outright replace a RAM bank with an independent DRAM controller (or some SRAM) to support higher speed operation. It's simpler(ish) and easier than a cache. The difficulty here however is NeXT is a DMA-first architecture, you'll need to both have your new DRAM array able to service alternate bus masters while also suppressing the onboard DRAM controller - may not be possible.
Other part of why this is common on Amiga is that is from what I've seen Amigas have a pretty poor architecture in the later machines; the 040 systems are actually running an 040 on an 020/030 bus. This is going to incur nasty bus translation penalties and may not support synchronous burst reads / interleaved access on the RAM. So, an 040 ends up starved of memory bandwidth; this is a similar situation to what the Macintosh accelerators for the earlier 030 mac machines suffer from.
By comparison, a 040 clock doubler that abides by Motorola timings (what I'm doing) on a native 040 bus will have an average 30ns penalty on each 'external' bus access, for an absolute worst case of about 80% maximum bus bandwidth as compared to a stock CPU. If allowed by the bus arbiter, that penalty is made up (and then some) for by the doubled CPU being able to issue back-to-back transfers nearly instantly compared to the stock CPU. So while there's further pressure on DRAM access with the faster CPU, it's not nearly as dire of a situation as Amiga given the bus is native synchronous 040 and supports line transfers.
A quick clarification as I've received some questions already; a
clock doubler type accelerator as I'm implementing leaves the bus speed unmodified; it isn't overclocking the logic board and requires no major modifications to the board. Turbo systems require a jumper installed to reduce BCLK to 25. So, CPU speed becomes 50mhz but the bus remains 25mhz. Work that can be entirely done within the CPU's caches receives the biggest performance increase, but IO activity like SCSI access remains exactly the same.
This is opposed to the
BCLK overclocking (https://www.nextcomputers.org/forums/index.php?topic=56) method seen here where an external PLL is used to double the frequency fed into the logic board PLL (de-soldered and replaced). This is running the system bus (BCLK) at 50mhz overclocking the entire system. This improves performance across the board,
especially video, but stability would have to be assessed on a per-system basis as it's far out of spec. Also, it's probably only possible for Turbo systems. Greater performance - probably best possible from black hardware - but an invasive modification.
Finally, povray numbers.
I need to see if I can find some nice synthetic benchmarks that stress memory access and direct video access. Recommendations appreciated!
So far it seems the doubler works out about 60% faster than a 25mhz (non-turbo with turbo chipset) system, or 25% faster than a 33mhz (Turbo-branded) system. I haven't yet tested in a non-turbo box. Or rather, the only non-turbo slab I have doesn't have a monitor/keyboard/mouse so all I've been able to validate is that the LED turns on and off according to DRAM/VRAM (?) tests same as the stock CPU does. So it's alive, at a very basic level.
Zigzagjoe, many thanks indeed for the super detailed reply, very interesting stuff!
Quote from: zigzagjoe on April 30, 2025, 05:12:13 PMThese full 040 (not EC, LC, or V-suffix) QFP chips in particular are hard to price as they have to be salvaged from boards. Despite being made through 2016 or so they were apparently very uncommon parts. In fact, I haven't even been able to find a datasheet from Motorola that even acknowledged full 040s were even made in the QFP package. The total supply that Eric was able to find for me in the last year or so wasn't even 100 chips total, and there's no chance of finding (authentic) QFP 040s on the open market in my experience.
Would be interesting to know what the 2016 CPUs were used for. I'm glad the engineers selected them :)
Alright, here's some more benchmarks.
First, thanks Nitro for that link. That definitely works for a practical test as these systems are C/Obj-C centric so it should reflect real world performance nicely.
Copying a bit from my post on 68kMLA: Due to efficiencies elsewhere the real-world throughput to DRAM via memcpy appears to be improved. Below 4K cache sizes, the speed increase is linear since that's entirely within CPU caches. Above that is where DRAM performance comes into play. I'll have to write a simple assembly memory benchmark at some point to get closer to the maximum as who knows how effective the C memcpy implementation is. Still, it's a very good sign for practical use.
Turbo systems need to be downclocked to 25mhz bus first.
Conceptually, this would reduce IO and RAM performance, if all else is held equal and we're only looking at raw throughput. However, this is offset by faster cpu performance (and additional waitstates @ 33mhz) so the effective performance in the
worst case scenarios ends up same as before. More often, it's slightly better. It can be seen in the earlier benchmarks that the IO performance is effectively a wash (disk, video) or slight gain (network).
Here's some compile benchmarks. I just took the bison package from the GNU Sources, copied it to /tmp, ./configure, then make; make clean; make to get these numbers. Looks like a nice improvement.
Final testing of the accelerator in the Turbo platform is just about complete. All outstanding issues addressed and it works brilliantly. I've built 8 of them now to verify consistency is good. I've had a bit of trouble with my QFP CPUs, but I narrowly have enough for a first batch. Future batches will be PGA CPUs (probably bring your own). Still haven't figured out what I'm going to call it.
I am leaning towards a small heatsink-fan as the most appropriate way to cool these. Technically, a large heatsink allows the stock system fan to sufficiently cool the CPU also but I don't know that'd be a reasonable assumption as I'm assuming there's a tendency to replace the noisy system fan with Noctua as is unfortunately popular in the mac community too. Also, operation with the lid removed would become a no-go.
I expect that the eventual PGA version I would include the heatsink-fan with appropriate mounting hardware if I am unable to supply a PGA CPU. PGA CPUs will require an active cooler, especially the 0.65um parts. The original CPU will
not work@50mhz, a replacement CPU of the following mask should manage 50mhz with cooling: L88M, K63H, E42K, E31F. Not sure if I could find little sachets of thermal paste, though...
I found that curl takes *a lot* of CPU time to draw its status. It looks like you're well advised to use the silent switch on large downloads: that increases speed by 2x or more! Here's updated benchmarks with that, and a small file copy benchmark. This continues the trend that IO performance is essentially identical to a 33mhz system (despite 25mhz bus speed) but anything involving primarily CPU like compiling, rendering, and daily use benefits nicely.
Complete data here:
https://i.imgur.com/5PXoF07.pngStill haven't got any testing done with the non-Turbo platforms aside from the extremely basic test I did previously. I'm pretty sure this is a no: Is there a way to configure the ROM monitor for serial console without having a display/mouse/keyboard?
It looks like color slabs may present a problem with the positioning of the DSP RAM socket - definitely won't be able to use a DSP SIMM. If it's like the mono slab (visually seems to be) the positioning of the floppy drive means I can't raise the accelerator with an extra socket to clear the DSP SIMM.
Amazing stuff :o
Quote from: zigzagjoe on May 10, 2025, 12:21:34 PMI am leaning towards a small heatsink-fan as the most appropriate way to cool these. Technically, a large heatsink allows the stock system fan to sufficiently cool the CPU also but I don't know that'd be a reasonable assumption as I'm assuming there's a tendency to replace the noisy system fan with Noctua as is unfortunately popular in the mac community too. Also, operation with the lid removed would become a no-go.
Can you do a large heatsink, with a large slow fan instead? Those small fans tend to get whiny, and gum up quickly.
I need to pull the motherboard out of my cube to see if the 040 is one of the early socketted ones ;D
Quote from: spitfire on May 10, 2025, 02:48:40 PMCan you do a large heatsink, with a large slow fan instead? Those small fans tend to get whiny, and gum up quickly.
Finding heatsink-fan combos below 40mm tends to be difficult, but we'll see. Noise-wise these are lost in the sound of the system fan. I haven't had any particular issue with them clogging in the macintosh accelerators designs I originally used them for.
Truthfully, I'd rather the heatsink alone anyways, but we will see what ends up practical. This early design with soldered QFP CPUs is a bit different from later designs since these use late model chips that require less cooling. I'm more concerned with getting the design fully tested first, still waiting to get that done locally. Not particularly worried about Turbo chipsets but I need to make sure the older bus arbiter/DMA controller behaves similarly, as well as the physical fitment stuff.
Due to some issues with the QFP chips, here's an image of the late 040 die (a bad chip underwent an unscheduled rapid disassembly during desoldering...)
do you have an order queue yet ;D
Quote from: luVWagn on May 16, 2025, 03:45:27 PMdo you have an order queue yet ;D
Ha, there is no shortage of interest but there's still testing to be done before I'd want to release these into the wild. So far they've been working great on my Turbo Color station; I've run a 3 day povray render, days of constant verified disk activity, and just generally giving it hell without a hiccup.
I did get to test it on a non-turbo NeXTStation today.... and it worked! I will need to do some testing to make sure it's stable and all functions work however.
Extremely early testing numbers:
Interesting insights: the Non-turbo machine actually seems to have *better* memory bandwidth than the Turbo machines. I am assuming the main benefit to the interleaving DRAM (and faster DRAM) in the Turbo architecture results in quicker random access rather than necessarily more bandwidth. Also interesting is that the expected memory performance loss due to the doubler is seen here where it is not present on the Turbo. I'll have to take a look at actual timings, though - it might be a delay due to bus arbitration. I really don't like how NeXT handled DMA.
These numbers shouldn't be compared against my Turbo numbers as I haven't yet controlled for the SD card/disk image differences. I don't *think* that's a problem, but I want to eliminate it to be sure. Disk access does seem slower than I'd expect given the SCSI controller is identical between turbo and non-turbo machines.
Also amusing is this will generate a Timer test failure with code C3 at boot - the timer test sets a millisecond timer and has the CPU wait to verify that it receives an interrupt in that expected time. However it does not control for execution speed in the busy loop (
https://github.com/johnsonjh/NeXTROM/blob/5dbd000731cb80b84644875d65b06b814443f251/pot/POT.c#L1036), so that means an accelerated CPU will always cause this error since it is faster. Apparently it affected the Pyro too. (
https://www.nextcomputers.org/forums/index.php?msg=22400)
So far so good with the non-turbo system. I can't really stress test the video/audio given I don't have a monitor, so instead I am testing it via serial console/network. Incidentally, at least for OpenStep 4.2 if you run headless you need to disable the windowserver in ttys otherwise it will always panic just after enabling the audio kernel service at the /very end/ of boot. That caused me to waste some time troubleshooting what I thought was an unstable system or accelerator!
Here's updated benchmarks controlled for the disk image and emulator used. Apparently disk access slows somewhat the more files on the filesystem? Prior numbers were all without developer tools installed, so I picked up *a lot* of new files on disk.
Doesn't seem like the turbo chipset makes much of a difference @25mhz, but non-turbo 25 vs turbo 33 seems to back up NeXT's performance claims and my accelerator clearly benefits from the Turbo chipset as well.

(
https://i.imgur.com/4KzxM4Z.png)
Nice to see a standard cube will beat a stock turbo. Not directly related to your project I've always wondered how a 50mhz 030 would have done against the 25mhz 040.
In Amigas using benchmarks available for that system, a 50mhz 030 returns iro 10MIPs whereas a 25mhz 040 is iro 18MIPs
That jives with my experience on the mac side too. I did quite a bit of tinkering as I made some 030 and 040 accelerators for Mac. Under ideal conditions an 030 will approximately match an 040 of half the clockspeed.
However, the caveat there is ideal conditions for an 030 are a 0 wait state external SRAM cache coupled with fast RAM and/or an exotic DRAM subsystem like the Mac IIfx used. Expensive and complicated. Meanwhile, the 040 is fine with commodity DRAM and a "standard" controller (so long as it supports bursts) due to the much more efficient internal caches.
So, from a system design perspective the 040 was immediately more practical as soon as it was out as compared to trying to squeeze every last bit of performance out of an 030.
Well I just popped open my non-turbo cube to have a look at the cpu, and it appears that the cpu is socketted :) So all set for accelerator activity in the future, fingers crossed 8)
Still working on getting qualification done. I just borrowed a color slab, installed a socket in it and started some testing. Seems to work just fine, physical fitment issues notwithstanding. Argh: as I was posting this it decided to throw a bus error on me while doing a network stress test. That said, it's not been recapped (except for the PSU) and only has 16MB of RAM, so I don't know how trustworthy it is.
Fitment is going to be the real problem as I don't want to do a board per different model if I can help it. I am holding off on any design work while I wait on testing, also, I'm playing with other things :)
Here's what I've found so far... it's more complicated than I'd like.
Mono Non-turbo: Requires an 3D printed insulator across the ground-bar and the bottom of the floppy bracket. Sensitive to socket height. ZIP VRAM will contact PGA CPU pins, not ideal, but may have to be tolerated.
Color non-turbo: Marginal fitment. Prototype board is too long and precludes use of DSP socket (rides overtop). Insulation required between top of DSP socket and board and slightly raises the rear of the accelerator. Very not ideal, but should work with stock socket. Insulation required between ground bar and bottom of floppy bracket (same as nonturbo mono). Sensitive to socket height. PGA CPU requires a shorter board.
Mono turbo: Untested, but visually appears to share component positioning with Color Turbo. Perfect fitment, no height sensitivity on the socket or PGA CPU.
Color turbo: Perfect fitment, no height sensitivity on the socket or PGA CPU.
Nonturbo cube: Requires new board design or socket spacers for prototype boards. Backplane will need to be modified to switch slots #0 and #4 (CPU board in rightmost slot, viewed from rear) or to switch #0 and #2 (CPU board one from left). The slot "above" the CPU board physically will likely not be usable.
Turbo cube: It is a mystery!
Explanations in no particular order:
Future PCBs will be designed for soldered PGA CPUs and/or PGA sockets. QFP CPUs are nice but impossible to source. Only the prototype boards will use these CPUs. It's a shame as they are very nice chips!
Height sensitivity on non-turbo models: Stock CPU top is 9.5mm above the board excluding heatsink. Low profile third party sockets may not insert completely or require a second socket to act as a spacer, which then introduces a risk of it being too tall and hitting the floppy bracket. And by risk, I mean it definitely will contact: I've got a board with a third party socket and with the height being 11mm or so the floppy bracket will push on the top of the PCB. Insulation is required. Technically it works, but it's not pleasant....
Cube: I expect slot remapping will always be needed even with a new board design, it was similarly required for the Pyro accelerator and is required to clear the CPU heatsink/cooler. It'd be difficult to avoid this and still meet cooling requirements.
Insulators: I've printed some tiny bits in PAHT (high temperature tolerant) that go over offending ground rails and floppy brackets. These will probably always be required on the non-turbo slabs, just to be safe.
It's definitely frustrating how fraught the fitment is inside these machines. Probably a large part why there is only the one vintage accelerator design. Moving to a PCB "sandwich" would alleviate some issues but I don't know of a way to make that happen at a level of effort that I would be able to put into this (as well as keeping cost under control...)
Let me see if I can get you some good pictures of a turbo cube board sometime in the next week or so (unless someone else beats me to it)...
Quote from: mikeboss on May 28, 2025, 03:15:38 AMturbo cube MLB:
https://i.imgur.com/zuqbSLa.jpeg
In particular I'm needing to figure out the CPU rotation, so a picture of the bottom of the PCB (so the missing pin can be seen) is needed. Or, the heatsink can be shifted around and a little bit of thermal paste removed so that the markings become evident.
Secondary task would be to identify the (unpopulated) 1x02 jumper that controls the bus speed, but that's far easier and just needs some probing.
Depending on how far you get without it, I might be able to send you a Turbo Cube board to play with. I'll be in Europe until the end of June though and my boxes are back in the US.
Quote from: gtnicol on May 28, 2025, 11:49:12 AMDepending on how far you get without it, I might be able to send you a Turbo Cube board to play with. I'll be in Europe until the end of June though and my boxes are back in the US.
I appreciate the offer! Hopefully, it won't be needed as I don't have any cube bits to boot one up with... just need to get the CPU rotation and jumper identified. This does raise the point that I ought to confirm a dimension board works correctly too...
Some productive work today:
Turns out the NeXT engineers made a bit of a design gaffe on the non-turbo hardware; it doesn't actively deassert /TA, it's only pulled up. This is the transfer acknowledge pin that tells the CPU the bus cycle is concluding. Too much capacitance on that and it won't rise in time before the CPU starts another bus cycle, at which point the CPU will load invalid data as it takes it as an immediate acknowledge. You're supposed to actively de-assert (drive high) before tri-stating the line, not just tri-state it and allow pull-up to return it back high. As is it takes an extra 25ns to rise which is far too slow. This flaw was fixed on Turbo hardware; it would have precluded higher clock speed operation. Happily, I can fix this in logic.
The slightly dubious color slab I'm testing with appears to have been unhappy with some bad contact on the SIMMs. Eventually, it stopped booting at all until those were reseated. Better that than an actual bug!
Quote from: mikeboss on May 28, 2025, 03:15:38 AMturbo cube MLB:
https://i.imgur.com/zuqbSLa.jpeg
That's a nice pic :) . Too bad there isn't a high quality video of one being made on the Assembly line in California. I've seen that low quality video on youtube which is still impressive.
Maybe this will help? Front and back of the same board.
Quote from: gtnicol on May 28, 2025, 04:26:15 PMMaybe this will help? Front and back of the same board.
Perfect, thank you! The turbo cube will be able to share accelerators with the slabs as well as use the current design. The accelerator will extend parallel with the RAM modules - can't be overly wide but shouldn't be an issue at all.
So it's really just the non-turbo cubes that are a major problem, but it may end up just being easier bundling a socket for use as a spacer rather than a specific board revision just to rotate the CPU - if the rotated design would still be too tall to go in Slot #0. I'll have to play with a physical cube to figure it out.
Quote from: zigzagjoe on May 03, 2025, 03:38:52 PM...
... I haven't yet tested in a non-turbo box. Or rather, the only non-turbo slab I have doesn't have a monitor/keyboard/mouse so all I've been able to validate is that the LED turns on and off according to DRAM/VRAM (?) tests same as the stock CPU does. So it's alive, at a very basic level.
Quote from: zigzagjoe on May 10, 2025, 12:21:34 PM...
Still haven't got any testing done with the non-Turbo platforms aside from the extremely basic test I did previously. I'm pretty sure this is a no: Is there a way to configure the ROM monitor for serial console without having a display/mouse/keyboard?
...
I see you managed to test on a non-turbo nextstation. How? did you manage get a keyboard/monitor?
- Regarding the keyboard (non-adb) i have made a keyboard emulator on an ESP32
- Regarding the soundbox/monitor replacement to hookup the keyboard, i have adapted VHDL code on a Tang Nano 9K FPGA
- Regarding monitor, you can use a normal VGA monitor on a mono nextstation, or a VGA monitor supporting sync-on-green for the color nextstation
- Regarding the serial console, you can configure a DHCP server and TFTP server and boot my linux kernel which configures the serial console on the PROM. The serial console will only work if you have no monitor/keyboard attached.
Just remove the RTC battery, and the default on the nextstation is to boot from the network via bootp(DHCP)/tftp
FPGA/ESP32 code:
https://www.nextcomputers.org/forums/index.php?topic=4481.msg32396#msg32396https://www.nextcomputers.org/forums/index.php?topic=5337linux kernel serial enable from TFTP "boot en":
https://www.nextcomputers.org/forums/index.php?topic=5107.msg32400#msg32400https://www.nextcomputers.org/forums/index.php?topic=4814
Quote from: ramalhais on May 28, 2025, 05:38:08 PMI see you managed to test on a non-turbo nextstation. How? did you manage get a keyboard/monitor?
- Regarding the keyboard (non-adb) i have made a keyboard emulator on an ESP32
- Regarding the soundbox/monitor replacement to hookup the keyboard, i have adapted VHDL code on a Tang Nano 9K FPGA
- Regarding monitor, you can use a normal VGA monitor on a mono nextstation, or a VGA monitor supporting sync-on-green for the color nextstation
- Regarding the serial console, you can configure a DHCP server and TFTP server and boot my linux kernel which configures the serial console on the PROM. The serial console will only work if you have no monitor/keyboard attached.
Just remove the RTC battery, and the default on the nextstation is to boot from the network via bootp(DHCP)/tftp
FPGA/ESP32 code:
https://www.nextcomputers.org/forums/index.php?topic=4481.msg32396#msg32396
https://www.nextcomputers.org/forums/index.php?topic=5337
linux kernel serial enable from TFTP "boot en":
https://www.nextcomputers.org/forums/index.php?topic=5107.msg32400#msg32400
https://www.nextcomputers.org/forums/index.php?topic=4814
An acquaintance brought a monitor and keyboard and I set the PRAM that way. Unfortunately, I ended up clearing the PRAM and it ends up stuck trying to netboot. Your kernel looks useful and I tried to use it to get going again. However, NeXT refuses to work with the bootp servers I have to hand and I don't care to devote the effort to build a vintage netboot setup or otherwise hack together a way to interact with it. Possibly I'll poke at hacking the ROM a bit more to get around that, but I'm a bit too frustrated with it right now so I may just wait until I can borrow a monitor/keyboard again.
Quote from: zigzagjoe on May 28, 2025, 08:16:20 PMAn acquaintance brought a monitor and keyboard and I set the PRAM that way. Unfortunately, I ended up clearing the PRAM and it ends up stuck trying to netboot. Your kernel looks useful and I tried to use it to get going again. However, NeXT refuses to work with the bootp servers I have to hand and I don't care to devote the effort to build a vintage netboot setup or otherwise hack together a way to interact with it. Possibly I'll poke at hacking the ROM a bit more to get around that, but I'm a bit too frustrated with it right now so I may just wait until I can borrow a monitor/keyboard again.
If you wanna give it a try, this dhcpd config works for me:
https://github.com/ramalhais/linux/blob/linux-6.9.y-NeXT/arch/m68k/tools/next/tftpboot/dhcpd.confJust change the IPs:
- subnet 192.168.111.0 netmask 255.255.255.0 {
- option routers 192.168.111.1;
- next-server 192.168.111.1;
And try setting up ARP for the nextstation manually:
arp -s 192.168.111.12 00:00:0f:00:30:90
PS: the "boot" file is the kernel in a.out or mach-O format
I found it acted the same with the ISC DHCP server and a modified version of your config, but inexplicably it started working with another random DHCP+TFPD server (haneWIN?). Might have been the ARP thing. Got the booting linux and nothing else, but waited a while and clearly it did work on the NVRAM as on reboot the settings had changed.
Of course, I promptly tried to replicate the particular configuration I had previously and ended up a creek again. I'll have to fiddle with it some more and/or fix the ROM.
Looks like the trick may be the static arp combined with unicast replies for bootp.
Not much to report here as I've taken a side quest into cache testing on Macintosh. I've taken a macintosh cache design and have reworked the logic to work with one of my macintosh doubler prototypes. It's electrically not an ideal situation but it is enough to qualify if there was further value in developing in this vein.
Here are some macintosh-centric benchmarks:
https://i.imgur.com/WOkeWTZ.pngThese are all purely synthetic and not a great mapping to the NeXT, but it shows there's value. The video performance in particular should be taken with a grain of salt as much of the benefit there is probably specific to how the macintosh drawing routines work and doesn't necessarily map to display postscript. I did however do a compilation benchmark (Doom) and the 128K cache reduced compile time from 247 seconds to 184 seconds, 25% faster. Doom itself saw about a 20% increase in speed. Hypothetically, I think this would be competitive with the Nitro...
The elephant in the room is cache coherency. If DMA is in use, care must be taken to maintain identical data in cache vs main RAM.
Most Mac cards aren't bus mastering, nor is anything on the Mac logic board, so it can be entirely disregarded for pure performance. NeXT uses DMA extensively, however, so the question is if all DMA accesses go to appropriately marked regions of RAM or if new DMA buffers can be allocated on the fly. If snooping is required on NeXT, this is going to hurt performance and also be a huge pain in my rear.
I don't know how comprehensive Previous's implementation of the MMU/cache bits is, but I may be able to use that to determine if those rules are ever violated and how cache flushes are handled.
Why this matters: if cache snooping isn't required, the accelerator can completely sidestep requesting the bus in order to run bus cycles as fast as possible with the hope that many of them could be filled from cache at 3x (or greater) speed. Worst case, by the time the cache has made a decision the bus should be available for an external cycle - hiding both the penalty of the doubler clock alignment variations as well as the cache lookup.
A few approaches are possible for cache coherency....
- if the external master never writes into a region of memory that was ever cachable, then no problem, because data will never enter external or internal caches (no work required at all)
- At bare minimum, the cache should invalidate the cache line so mismatched data isn't read. May allow higher performance under certain conditions.
- Ideally, the cache needs to snoop data written by an external master to maintain coherency. Complicated logic and sequencing. This approach can impact performance on accelerator designs.
This is all highly tentative though; a cached accelerator would be
expensive compared to the "simple" clock doubler design I've been working on. Cache requires 9 buffers, 4 SRAMs, 2 obsolete tag SRAMs, a larger CPLD, a 6 layer board, and even more of my time to design, build, and test. Not that the cacheless design is cheap by any means but it would be much less than a cached version. For most purposes the simpler design will provide the best value as a drop-in upgrade.
Fascinating stuff! Keep the updates coming!
Quote from: zigzagjoe on June 02, 2025, 12:45:37 PMNot much to report here as I've taken a side quest into cache testing on Macintosh. I've taken a macintosh cache design and have reworked the logic to work with one of my macintosh doubler prototypes. It's electrically not an ideal situation but it is enough to qualify if there was further value in developing in this vein.
Here are some macintosh-centric benchmarks: https://i.imgur.com/WOkeWTZ.png
These are all purely synthetic and not a great mapping to the NeXT, but it shows there's value. The video performance in particular should be taken with a grain of salt as much of the benefit there is probably specific to how the macintosh drawing routines work and doesn't necessarily map to display postscript. I did however do a compilation benchmark (Doom) and the 128K cache reduced compile time from 247 seconds to 184 seconds, 25% faster. Doom itself saw about a 20% increase in speed. Hypothetically, I think this would be competitive with the Nitro...
The elephant in the room is cache coherency. If DMA is in use, care must be taken to maintain identical data in cache vs main RAM.
Most Mac cards aren't bus mastering, nor is anything on the Mac logic board, so it can be entirely disregarded for pure performance. NeXT uses DMA extensively, however, so the question is if all DMA accesses go to appropriately marked regions of RAM or if new DMA buffers can be allocated on the fly. If snooping is required on NeXT, this is going to hurt performance and also be a huge pain in my rear.
I don't know how comprehensive Previous's implementation of the MMU/cache bits is, but I may be able to use that to determine if those rules are ever violated and how cache flushes are handled.
Why this matters: if cache snooping isn't required, the accelerator can completely sidestep requesting the bus in order to run bus cycles as fast as possible with the hope that many of them could be filled from cache at 3x (or greater) speed. Worst case, by the time the cache has made a decision the bus should be available for an external cycle - hiding both the penalty of the doubler clock alignment variations as well as the cache lookup.
A few approaches are possible for cache coherency....
- if the external master never writes into a region of memory that was ever cachable, then no problem, because data will never enter external or internal caches (no work required at all)
- At bare minimum, the cache should invalidate the cache line so mismatched data isn't read. May allow higher performance under certain conditions.
- Ideally, the cache needs to snoop data written by an external master to maintain coherency. Complicated logic and sequencing. This approach can impact performance on accelerator designs.
This is all highly tentative though; a cached accelerator would be expensive compared to the "simple" clock doubler design I've been working on. Cache requires 9 buffers, 4 SRAMs, 2 obsolete tag SRAMs, a larger CPLD, a 6 layer board, and even more of my time to design, build, and test. Not that the cacheless design is cheap by any means but it would be much less than a cached version. For most purposes the simpler design will provide the best value as a drop-in upgrade.
DMA is use a LOT on NeXT systems and they had a custom VLSI that helped a lot there. That said, if i recall, the Mac ][fx also has and employed some DMA, and had some significant caching. Not sure how that maps from mac 2fx to NeXT but thought I'd mention just in case it might be useful for some inspiration.
While the mac 2fx did make some noise about DMA, I suspect it didnt use it nearly as extensively as NeXT did simply because their OS was pretty lame at the time and I'm not sure their Unix OS at the time did anything there either.
Yes, NeXT uses DMA for everything where on Macintosh nearly nothing did. The IIfx was supposed to use DMA for at least SCSI, serial, and floppy, but in the end only serial and floppy DMA was used by Mac OS. A/UX did use the SCSI DMA. However, NuBus was intended to support bus mastering (DMA) since day 1, so they appear to have given enough care towards making sure that the Mac OS behaves reasonably sanely regarding memory pages intended for DMA.
I will need to verify that NeXTSTEP is appropriately marking pages of RAM as non-cacheable. I'd be amazed if it wasn't, but you never know... I have noticed the transfer cache inhibit bit is wired to the burst inhibit bit, which is slightly odd. It's probably just insurance.
If the MMU is appropriately set up, access to vram/devices will be inhibited which drives an output (CIOUT) intended to tell internal (and external) caches to not cache the bus cycle in progress. Just in case, I've figured out way to build in an ability to invalidate external cache lines so as to allow new DMA buffers to be allocated on the fly.
I'm proceeding on to a board design with onboard cache anyways so we'll see how it works out.
Well, I was wrong. It's the worst case scenario: NeXTStep doesn't mark regions of RAM as non-cacheable nor does it inhibit the cache on those accesses. Instead, NeXTStep seems to take the approach of invalidating (via cpush) DMA buffers. This causes any dirty pages in internal caches to be written to memory, before invalidating all selected lines of cache. This is.... fine.... and could lead to better performance, but is a big problem when you have multiple tiers of cache.
On writes, the external cache will maintain consistency with the flushed pages so not a problem there. There's no expectation that data in RAM changes.
On reads, we have a problem. The 040 has the cache cleared for that particular region of memory, but when the 040 goes to read the new data placed there by the DMA it gets happily supplied with the stale data from the external cache. Instead of whatever new data the IO device placed in RAM.
Best I can tell there's no mechanism to capture those invalidations in external hardware (except for, you know, properly marking pages as non-cachable in the first place). So this it not easily solvable.
One method of keeping cache coherency with alternate bus masters (DMA) is snooping their bus cycles. We know that an external transfer is in progress, so we capture the address on the multiplexed bus at /TS & ↑BCLK. If this address has a cache entry, we can either write the new data into the cache or at least invalidate the entry so it won't be supplied to the 040 without being updated.
NeXT Turbo and Mac are well behaved and can support both snooping approaches. NeXT
non-turbo is
not well behaved; it does not pass transfer start or transfer acknowledge strobes when external master is driving the bus. I suspect there's private strobes to the memory controller being used instead. Without a way to know an alternate master's bus cycle is in progress, I'm up a creek: I can't grab the address or know even that I need to invalidate/snoop.
This leaves the least desirable approaches:
1) Always invalidate the full cache on any DMA access.
This has dire implications on the cache hit rate and would basically be the same as no cache if sound or ethernet activity is happening.
2) Patch software so that cache invalidation is caused in the external caches.
There is one out available: what I've found so far from the software side is based on the NeXTStep 2.x sources. The Nitro accelerator solved this issue somehow. If I can find some proof in later versions of how this was supported for the Nitro's cache then perhaps I can leverage that or do something similar. I feel like this issue would have occurred in the x86 boxes, too...
I don't have a huge amount of interest in starting to disassemble the *Step kernels, though, so I'm probably likely to pause this project for a bit and focus on the non-cached accelerator which still has some ironing out needed. If anybody has any pointers or interest in digging into the kernel, I'm all ears....
Here's the new board, anyways.
This adds a 128K direct mapped cache behind buffers so it's theoretically capable of supporting full-speed operation from cache while the bus is unavailable due to DMA access. I actually have found it can operate up to 57mhz with cache in my Mac testbed - ridiculously fast! That requires a bus overclock, however.
Cache supports up to 160MB/s of bandwidth on cache hits using 2:1:1:1 timing. 5 cycles to read 16 bytes at 50mhz. This is 3x (or more) faster than main memory's maximum bandwidth, or something like 4.5x the non-turbo memory.
In testing on NeXT Turbo, unoptimized, I found around a 30% gain in POVRay and 15% on gzip compression. Roughly agrees with what I found on the Mac side. However, the cache being enabled as-is on NeXT causes immediate dysfunction of ethernet and disk corruption so my ability to test was limited.
Wow that's a lot of work to get to a dead end. :(
I think a few people here have a Nitro and a Pyro. Would having access to them to do some testing be something worth while to you? Perhaps some wouldnt mind giving you access on loan to tinker?
Quote from: zombie on June 17, 2025, 12:39:43 PMWow that's a lot of work to get to a dead end. :(
I think a few people here have a Nitro and a Pyro. Would having access to them to do some testing be something worth while to you? Perhaps some wouldnt mind giving you access on loan to tinker?
Yeah, I'd be lying if I said it wasn't quite dispiriting. I did come with a few takeaways that can apply to the non-cache accelerator, at least. I'll probably still finish a snooping implementation because I want it for myself, and it has applications in the Mac environment, but I don't know that it'd make sense to make "in bulk" as the most compelling use-case would be for the non-turbo systems.
Not that the non-cached accelerator (working name of NeXT Overdrive) isn't a nice improvement already for non-turbo systems, but I'm greedy!
I wouldn't have any need for a Pyro, I've moved past that with my own design and I know what they're getting up to. At some point it would be good to dump the GALs and trace the connections just for archival's sake, but it's no longer needed.
I don't know that hands on a Nitro would be super beneficial given it all centers around the giant ASIC either. Really, I suppose the question regarding the Nitro is best coached as this:
Is there any proof of a Nitro accelerator ever having been run in a Non-turbo system?If there is an example running in a non-turbo system, then they had a solution to the coherency problem.
If there is no proof, then they were probably snooping the bus which is
only supported on the turbo chipset.
Hello Zig Zag Joe: "I don't know that hands on a Nitro would be super beneficial given it all centers around the giant ASIC either. Really, I suppose the question regarding the Nitro is best coached as this: Is there any proof of a Nitro accelerator ever having been run in a Non-turbo system?
If there is an example running in a non-turbo system, then they had a solution to the coherency problem.
If there is no proof, then they were probably snooping the bus which is only supported on the turbo chipset."
+++
I actually have a Nitro :) but I do not know if it even ran on a non turbo motherboard , I have a new contact exNeXTy engineer that worked briefly on the NRW at NeXT . I also have another exNeXTy I procured the NITRO from and have sent him a message and point him at this thread to see if he has any ideas or may have an answer on the non turbo support. Brian Archer may know as well.
The board is so rare I would be worried about damaging it via testing, kudos for your work on this! Peace Rob
Would the Previous emulator yield any clues about Nitro Compatibility with non turbo chipsets ?
Hello Rob and all,
the Nitro can only be run on Turbo systems. It would not successfully connect to non-Turbo models.
With some help from Andreas it's become clear NeXT used the software approach, where the cache invalidation routines have been updated to be specifically aware of a L2. Unfortunately it's a specific hardware approach at least in earlier versions of *Step so it would still require a patched kernel for my hardware unless I tried to perfectly emulate a Nitro board.
There's a bit of a problem in that my cache architecture was not intended to provide for direct access to the TAGs, but I could certainly contrive a special case that blows away the cache lines the DMA buffer
could be located at even if the buffer wasn't actually in cache. While this is going to blow away some entries that didn't need to go away, it would still be the easier solution in that it requires no substantive changes to how the logic works internally.
I'd need to find a hole in the address map that I can use for invalidation but I should only need 128KB. I have enough decode that I could use a bit of the IO space for that.
I've been trying to think about snooping and it would be a mess logically. I need to be prepared to drop everything and invalidate/snoop an external access at more or less any time. Right now the internal bus state is uncontested and the 040 doesn't ever have to ask for the bus, even with DMA is in progress. Here's a picture from my Mac testbed.
This is (what passes for) a DMA burst from the NIC to main memory, but the entire time the cache is hitting everything the 68040 has asked for so the CPU is running at full speed internally instead of waiting on bus availability. It's basically a best case scenario, but in general I found most DMA activity to get a few cache hits before running into a transfer that
required the bus. Bus arbitration is done in parallel to the cache activity so on a miss it should have the bus available at the point the need for an external cycle is known rather than incurring a slight penalty as the simple doubler type does.
The NeXT DMA doesn't steal the bus for extended periods like this, but I would still expect this to provide performance gain both due to less bus contention but also due to not needing to wait on arbitration to begin an internal bus cycle.
Snooping turns all of this on its head and makes a real dog's breakfast out of the sequencing.
On a (full) snoop cycle where data in cache is updated, address and data must flow through from host; 68040 is unable to use the bus at at all while the cache is snooping.
Invalidation cycles are cheaper - when an address has been latched, get the internal bus idle, emit the address, check for a match, and clear validity bit. I think one of these could be dispatched in 40ns (2 cycles) since there's no need to actually do anything with the data.
In either case the bus needs to be able to be preempted from the 040 if it's waiting to take bus ownership due to the need for an external cycle. So a bus retry & re-arbitrate sequence must be issued to get it off the bus, before running the snoop operation, and then maybe resuming normal activity? Ugh. Or we have fully synchronized arbitration - external arbitration required before internal bus cycle can begin - putting the 50mhz 040 at the mercy of 25mhz arbitration timings, even for cache activity. Not worthwhile, as our 160MB/s bandwidth has just been halved or worse.
So that's basically the conundrum. Do I require a modified kernel, or sacrifice performance for a software-agnostic solution?
The upshot of the modified kernel approach is that it wouldn't preclude operation in non-turbo hardware, at least, but I don't know much about the software environment of NeXT beyond what I've seen in the ROM and early kernel source.
It's fascinating to read all the updates, many thanks for keeping us in the loop!
@zigzagjoe This is one of the most thorough investigations of the hardware I've seen. Also love that
@andreas_g is able to provide his view of the world to help guide. Even if the accelerator finally can not achieve what you wanted it's hard for me to see this as wasted effort. Absolutely fascinating exploration!
A number of people in the NeXT community have tried to find the kernel source code for NEXTSTEP 3.3 and OPENSTEP 4.2 without luck unfortunately. Let us know if you find it as it would make some people happy.
It's been a trip, that's for sure. Good news: with a patched kernel OpenStep4.2 is running, replies to pings, and hasn't corrupted its filesystem either (yet)!
The change here is I have hacked in (to my accelerator) a private address space that the kernel can use to invalidate cache lines. Support for this was added to support the Nitro in the Kernel so I perverted that to my own ends. At least NeXTSTEP 3.3+ seems to have this, I will need to look at each kernel version to see if they have it or not and make the necessary changes.
A modified kernel is required as my address space is located & configured differently from Nitro but the functionality is the same: TAGs for the cache lines reads DMA buffers
could be located at are indiscriminately cleared so the 040 will read fresh data - if that buffer had been located in L2. if not, we just blasted some cache for no reason. There is no helping that though and as a read-only cache there is no data lost.
Technically I could hack hooks into earlier kernels to work the cache but it'd be much more work. Given that it gracefully degrades - cache simply doesn't enable unless interacted with - I don't think it is important.
I've been rushing ahead as I wanted to get this working for VCF, so it will require a bit of backtracking to firm up hasty logic changes before I even think about optimizing and also validation of stability. But, this is heady news. Super-duper-preliminary numbers show a 30% improvement in compiling, povray, boot times, and interestingly in HTTP performance too. I think the HTTP performance improvement is probably due to lower bus contention and asynchronous processing being possible. Memory throughput in the L2 cache region increases to ~ 25 mb/s (from 4k through 128K) as measured by membench. NXFactor video performance has also increased to .836 from .676.
It definitely feels snappier and FTP performance in particular seems much improved too.
Quote from: zigzagjoe on June 19, 2025, 10:47:34 AMSupport for this was added to support the Nitro in the Kernel so I perverted that to my own ends. At least NeXTSTEP 3.3+ seems to have this, I will need to look at each kernel version to see if they have it or not and make the necessary changes.
Support for Nitro was introduced together with the Turbo machines. Therefore I am pretty sure it is included with NeXTstep 2.2 and later. Running 2.2 in Previous shows that the kernel checks for the Nitro address space.
Wow. I'm frankly stunned you pulled that together so quickly. This is fantastic!
:o I need this lol. Man so much development on NeXT since my hiatus
Hello ZigZagJoe: I'm up to speed on this whole thread and kudo's sir.
I started writing again below and I sold out of the 2024 Inside NeXT so
I'm sponsoring a run of 2025 super extended edition as I believe I'll sell another 100 copies.
Permission to ZIGZAGJoe to land my NeXT hardware ,software mothership on your magnificent project .
I laugh because ZIG ZAGing a Sub or Boat is dubbed a crazy Ivan aka someone that wants to join forces like a super secret double agent.
I guess I'm crazy Ivan joining your project hopefully any questions on my dedication to NeXT may be found on my youtube.com/robblessin channel nearly 1000 NeXT and other project videos.
I'm going to try and think out side and inside the box on this one, how is it always 3am now 4am ha just woke up, I guess I really am in a Black Hole tonight.
It sounds like you are a possibly a little thin on actual NeXT hardware in your shop needed for testing currently.
I would absolutely enjoy helping remedy the situation man I don't know if you
saw my offer before in 68K forums and I hope I'm not jumping into late.
What do you need Joe> I would send you a Cube with an 030 and 040 recapped motherboard to
use for your project ,
I have custom cables to flat panels ,
sound cards
and Quokkas to let you use USB to NeXT peripherals ,
ZULU or SCSI2Sd cards etc
ram etc
Software out the WAZO , manuals.
Socketed Motherboards for Color and NeXT 25Mhz mono stations
New old stock power supplies and floppies
All I would ask In return :) once everything is firing on all cylinders perhaps a couple of them
if you want to keep the hardware , I am happy ot just out right donate 1 system to you ,
no strings attached just cover shipping
if I can buy some of of these in future and or beta test them hopefully
I will be able to procure some of these magnificent accelerators.
That would be awesome.
I know you prfer to sell direct and
I like your cat the inspector BTW, I have one inspector kitty as well.
I miss little YODA inspector 2 though as he was one of those pets
into the middle of everything
never more than 6 feet from me.
Bella my other cat makes weird noises and will show
up with like of ram, or a cord or connector lol and she watches as I install it into a NeXT.
A little well desrved treat.
I need to get it on film as no one would believe it until they see it.
Epiphany
Holy Crap between your Processor 50Mhz accelerator and Nitros equally impressive hardware bus based mojo accelerator performance boost,
or either or I'm excited here.
My question is would it be possible to have both devices in the reality distortion field, coexist on the same motherboard , all this NeXT level exciting.
IMHO you know who would be really impressed with all of this Steve OG Jobs. A tribute to him ,
I know he was always burning rubber into the future claiming not to care about the past but deep down I know he had a soft spot for NeXT, it was his baby.
I actually own the Turbo Color Steve gave to Larry Ellison ,it is kind of funny because it has a Color Station board under the hood. If anybody is trying to ping us from the afterlife it is Steve , I'm sensitive to these things BTW. It runs in the family mom says it is a Scottish thing , and today I felt my dad's presence here it was trippy.
Brian Archer we're chatting a few days ago that it is amazing how far along all of this has come ,
I'm looking at Previous development in awe, I've been doing this 32 years ,
I'm that guy carried the NeXT torch this stuff makes me happy.
Not trying to shift focus from the thread thought may be it would give you a
little different perspective Joe of what your project means to me personally it is solid gold, thank you sir:)
Highlights of what is avaible if needed Joe BTW you can configure NeXT's to use a serial port as an alternative console by changing the system parameters, you mentioned netbooting well by
default when the bios battery is replace NeXT defaults back to netbooting in graphic mode
So you reset the system parameters command ~ at NeXT prompt type p for parameters booten? change to SD
It also lets you set verbose mode booting , by sure to change power on to yes
to get verbose boot mode Yes to stick
Also change setting allow serial port to be used as an alternation console
Set the me password then logout and set the root password , may be it is been there done that.
Projects , I'm going to jump off and get sleep but it is Sunday
I'll be back let me know on the hardware please youy can dm or best is bhi1@ix.netcom.com
Oh by the way I made a QR Code to make it easy to register here in the future ,
Nitro has it hopefully it makes the process easy and it will be on our home page,
keeping registration on an independent page gives authentication security and
prevents spammers targeting and hackers coming after the NeXTcomputers.org round the clock.
I know if would have helped you Joe.
Usiketos Verilog HDMI one of these is eventually coming,
Brian Archers NU I/O sound ,
the Quoka USB Peripheral to NeXT
and ZULU 2350 Blaster,
SCSI2SD 6.4 and
BLUE SCSI and
Joe question for you what ever in the heck is attached to your scsi connector on that frankenstein board did you design you own implementation of an SD Card hardware emulator ?
That is what it looks like to me, my guess is yes.
Umbertos DSP 256 Chip that fits in the open slot
One observation do you know you can boot the NeXT up as a Macintosh running up to system MAC OS 8.6 natively thanks to Simon Shuebig as he is a jedi coder the Code for DayDream the product of which I speak:) DayDreams original implementation had an Apple rom dongle hanging off the DSP port
Simon S may be a valuable resource here
It may be a face palm moment . so circling back
I'm sitting on several hundred NeXT boxes over here including many Cubes ,
If you need in anything in terms of NeXT hardware, I would be happy to donate and loan to you for exchange for some of your accelerator boards.
Apologies for not being more proactive, hope some of the above is relative , there was a thing called a Black Box that would have allowed for 512Mb of ram to hang off of the scsi never saw it not sure if it was vapor ware
http://www.kevra.org/TheBestOfNext/ThirdPartyProducts/ThirdPartySoftware/InputOutputAndStorage/BlackBox/BlackBox.html
Had this at VCFSW last weekend :) Had a slightly filesystem corruption bug which I've just fixed, as I was looking at IDA then hex editing the kernel for my cache-invalidation functionality. A single missed bit. The loop is very inefficient as written, so I will need to go back and fix that. Hopefully I'll have some of the home-baked benchmarks soon.
Yes, it was a little vain of me but I thought a black computer needed a black SCSI emulator so I made a variant of Rabbit Hole Computing's ZuluSCSI OSHW design in black with my own specific touches. It served as a testbed for a laid down RP2040.
The black box appears to be a DSP with local RAM so it wouldn't function as a RAM expansion for the system proper (that would be very slow, anyways). Unfortunately the NeXT Memory map is rather packed so it would be impossible to add additional banks of RAM without major changes. I did see the daydream topic the other day. That is impressive, I've got an inkling of what would have had to go on "behind the scenes" to make that possible and am surprised they had official Apple support to make it work.
My OverDrive/HyperDrive accelerators wouldn't be able to co-exist with the bus overclock board Nitro has come up with. They operate the CPU at 2x bus speed but the 040 is only good for a maximum of 55mhz or so. Conceptually it would be possible to make a board that adds cache only (@50mhz system bus) but it would require adding wait states to most accesses as you need to look at the access first to determine if it is cachable or not, and if it is an eligible access, is it in cache or not. Additionally, it would be exactly as complex as the current design is as bus buffers are required for reliable high speed operation and the multiplexed bus. I don't anticipate working on this as I've already got a lot of work ahead of me.
As far as hardware goes, right now the weak point is that I don't have peripherals for the non-turbo system I have. I have a monitor + keyboard and mouse due to me but the fellow with them hasn't been able to make the road trip yet, so I have been doing all my testing on non-turbo via serial port as all I have is the slab itself. That has been fine for the time being though as I still have a couple of issues to iron out there, if I had not gotten sidetracked with the cache :)
I will eventually need to probably do some closer mechanical testing with a cube so I may take you up on your offer to loan one out. For now I have my work cut out for me. Maybe I will need to borrow a sound box replacement eventually also for the non-turbo.
While not directly related though I am hunting for a N4006 monitor power supply board as the one for mine is too far gone to fix, I think.