Reading from the framebuffer?

Yeah, that's why I asked. Quad division is really tricky, not including the fact that there's no hardware UV texture support.



Okay, that makes sense. I guess that keeps you from idling both on the VDP1 and CPU.



It's on the CPU-bus, sadly.





That's insane. Where is this negative feedback coming from? With anything, you're going to get your percentage of idiots who don't know what they're talking about. You know you've made it to the big leagues when you start getting death threats. Don't let that discourage you. Really, work on what makes you happy.

Another thing is that the game looks like a vertical slice rather than a tech demo. Some people may have a hard time understanding that. If it was more of a prototype, it might allow people to have a better understanding that the game is of course still in progress. But then again, people are stupid.

Well, it's always the same things :
-Why didn't you include the fisheye lens? (Sega themselves couldn't do it but people still expect you to pull a miracle)
-Why don't you use sprites instead? (seriously??)
-The game should focus on exploration instead of reaching an end goal.
-The game should focus on speed instead.
-Why don't you make Sonic Utopia on Saturn instead? (never heard of RAM? Limited draw distance?)
-Why don't you port Sonic Mania instead?
-Where are the bosses?
etc.

So it's super annoying to push an update just to receive 100 comments of people already nitpicking, asking for something else, complaining, asking for more, etc, so I'd rather move away from this project as it's not even fun anymore to do it, it feels like I'm working.

Anyway, about the BSP tree, I'm pretty sure that's what the Slavedriver did, as you can see the maps have all sorts of weird small polygons/triangles all over the place, which shows that it was automatically processed, so the most likely answer is a BSP tree.

So I'm not sure right now between 2 options :
-manually placing portals, keep an octree and just use a pvs for occlusion culling (more work, more overdraw, but way faster for the CPU and less quad subdivision).
-Using a full-BSP compiler, so everything is automated, minimum overdraw but slower cpu operations and more quad subdivision (more vertices, more draw commands, etc.).

Considering my own experience with Saturn Quake maps, using 64x64 textures is really pushing the Saturn hard, while 32x32 makes it much easier. The Slavedriver engine used 64x64 only with gouraud shading everywhere, which shows how good it is, but I think its bottleneck is the CPU since it also slows down in emulators, which never happens to me within emulators when I use high quality textures with gouraud shading, while CPU issues do slow down emulators. So with 32x32 textures, overdraw isn't as bad.
 
I hope you don't get disheartened, the stuff you've been doing with Sonic Z-Treme is some of the most impressive homebrew I've seen for the Saturn (or most consoles for that matter).
 
@XL2

Are render-to-texture the VDP1 framebuffer in this screenshot?? Is like megaTV in destruction derby 1 in psx1???

Are possible make render-to-texture "animated" whit SS at good speed? Or acceptable penalty? What do you think?

I think only whit VDP1, because render-to-texture all final VDP2 output is possible, but not usable I think. Very very slow.
 
@XL2

Are render-to-texture the VDP1 framebuffer in this screenshot?? Is like megaTV in destruction derby 1 in psx1???

Are possible make render-to-texture "animated" whit SS at good speed? Or acceptable penalty? What do you think?

I think only whit VDP1, because render-to-texture all final VDP2 output is possible, but not usable I think. Very very slow.
It reads the framebuffer skipping some pixels, which I sent to work RAM and SCU indirect DMA to vram during v blank, and displayed as a sprite.
It's updated each frame, so you get a picture in picture in picture.
Vdp2 would be the same if you use a bitmap layer (nbg0, 1 or rbg0) at 16 bpp.
You can zoom 4 times I think, so a small (like 88x64) image could fill the screen, but yes, reading the framebuffer is quite slow.
You also have the problem of palette code pixels not working for a rgb layer/sprite and vice versa, so to do like Destruction Derby on PS1 you would lose the background and RBG0 layers unless you process it a bit more.
 
It reads the framebuffer skipping some pixels, which I sent to work RAM and SCU indirect DMA to vram during v blank, and displayed as a sprite.
It's updated each frame, so you get a picture in picture in picture.
Vdp2 would be the same if you use a bitmap layer (nbg0, 1 or rbg0) at 16 bpp.
You can zoom 4 times I think, so a small (like 88x64) image could fill the screen, but yes, reading the framebuffer is quite slow.
You also have the problem of palette code pixels not working for a rgb layer/sprite and vice versa, so to do like Destruction Derby on PS1 you would lose the background and RBG0 layers unless you process it a bit more.

Very interesting what you comment here XL2. Sorry for the delay in answering you.

Several "conclusions" that I see interesting about your answer:

1) Its use is possible to achieve this Destruction Derby effect or similar.

2) It is possible to use it to get one part of the effect of Burning Rangers "transparency layer".

3) When you mean that reading the framebuffer is slow. You mean what you do now. In other words, read the framebuffer of the VDP1. Or the final framebuffer of VDP2?
I understand that both are slow. The VDP1 less than that of the VDP2. But could you get used without penalizing a lot? Balancing resolution/FPS read from the framebuffer?

4) The problem about capture foreground VDP2 that you are talking about is clear. When capturing the "instant" before going to VDP2, we would not have the "backgrounds" information. And the palette or MSB On pixels for VDP2 would look "weird". I thought about it a long time ago. As a solution I thought, that the function that the read the Framebuffer, draw a predefined color as "background color" equal or representative of the "real" background of the VDP2, wherever it finds a palette pixel/MSB On or a transparent pixel. It is clear that it will not be the "Real background" but it will be better than a black spot or strange colors.

Regards,
 
I tried fetching the framebuffer using SCU DMA, skipping lines, but it didn't work.
It did work in SSF, but Nova tells me it's an illegal operation to SCU DMA from the frame buffer (0x25C800000 ) and it just doesn't work in Yabause.
It is my understanding that 0x25C800000 is frame buffer 0, but it should switch every frame between 0 and 1, but I have no idea what register to look for to see the current back buffer.
Anyone knows how I should proceed?
SCU indirect can skip some pixels (4 bytes I think), but according to the technical documentation it doesn't work with the B bus.
DMA transfering the whole framebuffer every frame is too crazy (256 KB), and their is also the issue of bytes alignement if used with NBG layers.
The idea would be to send it either to NBG0/1 layer, to VDP1 ram or a h-ram buffer (I'm mostly toying with it with no clear intention at the moment).
Any ideas?

On a side note, is it possible to fetch the complete image (VDP1+VDP2) sent to TV?
 
That's odd. From what I've seen in the SCU restrictions, VDP2 VRAM cannot be read via SCU-DMA.

One thing to note is that you cannot read while VDP1 is drawing. To know whether VDP1 is finished drawing, use the SPRITE END IRQ, or poll the EDSR register. Then use PTMR to force stop drawing.

What does hardware say? Mednafen? I'd also look at what Burning Rangers did... for sure they're doing what you're intending, which is copying a sampled FB copy to VDP2 VRAM.

As for your last question, I would love to know. I doubt it, but there's something like EXBG on the VDP2? It would've been amazing to have direct access to the final output.
 
@mrkotfw Rayman make a render-to-texture of VDP1+VDP2 in the Fade pre and post loading 3D animation. If you know how to research what it does, it would be great. It seems very slow, because seconds pass until the animation is done. Thank you!
 
The physical address of VDP1 VRAM in the memory map is 0x5C00000. 0x25C00000 is the CPU's cache-bypassing alias.

Edit: See section 8.3, "Address Space and the Cache" of the SH7604 manual.
 
The physical address of VDP1 VRAM in the memory map is 0x5C00000. 0x25C00000 is the CPU's cache-bypassing alias.

Edit: See section 8.3, "Address Space and the Cache" of the SH7604 manual.
Thanks, but in SGL too the sprite VRAM base address is seen as
#define SpriteVRAM 0x25c00000

Changing it didn't change anything in fact.
It seems like it's an emulator issue, Yabause doesn't support it and Nova says SCU DMA reading from "VDP2 RAM" is illegal, but the author of Nova, Steve Kwok, told me it will be fixed in 0.5. Yabasanshiro does support the operation, so does SSF.
It allows me to read a part of the screen and send it to a work ram buffer, before sending it to vram after vblank out.

I'll have to test on real hardware to see if it behaves correctly.
What I don't get is that choosing Frame buffer 0 (or 1) all the time has no impact.
Is it because the system just redirect the dma access to the other buffer?
Or is it just going to crash on real hardware?

(In the images, the size of the sprite is 176x80)
 

Attachments

  • FB_demo.png
    FB_demo.png
    106 KB · Views: 221
Last edited:
Thanks, but in SGL too the sprite VRAM base address is seen as
#define SpriteVRAM 0x25c00000
That macro is defined for code running on the CPU. The CPU must bypass the cache when accessing peripherals to avoid bad size-effects (eg. accessing forbidden addresses, returning stale data). Page 4 of the VDP1 manual clearly states
VDP1 is at the absolute address 5C00000H of the system.
It's of course possible that the rest of the hardware doesn't fully decode addresses, and ignores the 3 uppermost bits, but it definitely doesn't know about any CPU-internal addressing schemes.

IIRC there's an errata or technical note that says you can't use SCU DMA to read from VDP2 memory. It doesn't say what the effect is if you try, though. There's also some limits on access widths to keep in mind.
 
  • Like
Reactions: XL2
SCU errata 1 says "Write to the A-Bus by the SCU-DMA is prohibited", and 2 says "Read from the VDP2 area by SCU-DMA is prohibited". The document doesn't detail what happens if you break these restrictions.
 
SCU errata 1 says "Write to the A-Bus by the SCU-DMA is prohibited", and 2 says "Read from the VDP2 area by SCU-DMA is prohibited". The document doesn't detail what happens if you break these restrictions.
Thanks, I'll use 0x5C800000 then.
So, it is "normal" that using Framebuffer 0 (0x25C800000) all the time works even if it's supposed to be the displayed buffer?
About the timings, right now I SCU direct DMA the framebuffer (0) to work ram (I have a loop, like y=0; y<80; y++, SCU DMA 172 pixels for each scan lines, total size of 172*80* sizeof(Uint16)) after v blank out (slSynch) and then transfer that buffer to vram (both as a VDP1 sprite and NBG0 layer, both are working in emulators).
I think SGL stacks them, so no need to wait, but all this means I'm 2 frames "late" as I understand it.
I will have to find a way to start drawing, interrupt, transfer the buffer and then restart drawing on the same frame buffer to try that Burning Rangers effect.
It runs fast in emulators (like no noticable slowdowns), but the DMA is much slower on real hardware, so what I'm doing might not work at all.
Any suggestions to speed it up?
 
So, it is "normal" that using Framebuffer 0 (0x25C800000) all the time works even if it's supposed to be the displayed buffer?
Yes. You can only access the back buffer, and it's always mapped to the same address. See section 2.1 of the VDP1 manual.
 
  • Like
Reactions: XL2
Yes. You can only access the back buffer, and it's always mapped to the same address. See section 2.1 of the VDP1 manual.
Ok, thanks! I guess I misunderstood what it did, I thought it swapped buffers, so like FB 1 would become the back frame buffer.
One less problem to worry about...
 
So it's working fine on real hardware, it seems to have very little impact on the framerate (no slowdowns).
I doubled the width, which is why it's weird and looks like it's ghosting, I haven't implemented the full Burning Rangers technique yet, but I have a pretty clear idea how I will pull it and it shouldn't be too hard (except for the overdraw - transparent objects over opaque objects - but that's where my bsp engine will be useful once I port it).
I won't swap the frame buffers, I will just write to an unoccupied area (from x:352 to x:511), and just copy and scale it by 2.2.
I copy the framebuffer (176x112, but it will be 160x112) to work ram, then dma during blanking to the nbg0 layer.
 

Attachments

  • 20181130_172736.jpg
    20181130_172736.jpg
    3.5 MB · Views: 244
  • 20181130_171031.jpg
    20181130_171031.jpg
    3.5 MB · Views: 260
Last edited:
Are you copying non-paletted render?

When it comes to rendering off screen, do you set the system clipping (or user clipping) command before rendering?
 
Are you copying non-paletted render?

When it comes to rendering off screen, do you set the system clipping (or user clipping) command before rendering?
The frame buffer is 16 bits, so I just use RGB codes (including CLUT). The system clipping must be set wider and it must be the first draw command else it doesn't work. That's (small) one issue I'm facing with SGL as I need to get around it's restrictions, like it doesn't let me change the system clipping and the polygon for clearing the frame buffer , so I need to overwrite some draw commands. I will need to find where it keeps those in work ram. And of course, you also need user clipping commands to prevent drawing at a wrong place on screen.
 
Back
Top