19

Making the CPU was easy :(
Porting the GCC is damn hard.
Hotel? Trivago 😭

Comments
  • 2
    By making CPU you mean writing whole control logic for a standard ISA or just forking some repository to make changes

    Also just curious as I started learning somethings regarding risc-v do you have a suggestion of verilog simulator , i am planning to use Verilator or icarus verilog just wondering if there are other open source options
  • 4
    @hardfault

    I wrote the whole ISA. Its a CISC based architecture. 32bits. Simple enough for an FPGA.

    Verilator is the main open source simulator. Also use whatever the repository is using, usually they have the whole RTL simulation process automated for you.

    I wrote the core in SystemVerilog and used Modelsim as the simulator. Also because im using an Altera FPGA and Modelsim includes the necessary libraries to simulate the DRR2 controller and other IPs that altera offers.
  • 5
    Calculating the antenna won't do anything, you need to input the haptic SMTP bus. If you program the capacitor, it can get to the SMTP spyware through the cross-platform SMS panel! Basically, compress the high speed COM internet, and that should synthesize the UDP bandwidth.
  • 5
    Too much jargon for me 🥴
  • 1
  • 2
    @Bybit260 you read that right. 😌
  • 0
    @akshar yeah, but.. Which antenna?
  • 5
    @Bybit260 he just trying to be funny by jumbling together words which are not related to each other in sentences, by this he is inferring that “this is how it feels like when some people on devrant talk”
    Aka he is being sarcastically funny
  • 2
    @Bybit260 why go with a cisc based architecture?
  • 1
    @rutee07 ..true.
  • 0
    @matt-jd CISC is better for performance because of memory bandwidth and latency, so you can have higher performance and code density with less memory bandwidth.

    RISC its just better for low power devices.
  • 5
    @Bybit260 not true, pretty much every high performance processor right now is RISC. ARM and x86 processors are both sort-of-RISC, as are most GPUs. What you lose in memory bandwidth you gain in ease of scheduling and faster, simpler, more predictable control logic (try doing superscalar out of order with pure CISC, it's a nightmare)

    (x86's cisc isa is decoded internally into a bunch of risc-like micro operations which are the things actually being processed)

    (for that matter one would think the ISA's effect on memory bandwidth matters more for low power processors which have constrained buses and caches than for huge behemoths with wide buses and huge caches, but you say RISC is useful there)
  • 1
    @RememberMe Yeah, you just explained to me how a modern CISC CPU works. So what is not true about "CISC CPUs are better for performance"?

    CPUs are required to do very linear tasks, so it makes sense to do more work on a single instruction. Note that it also depends on how you implement the hardware architecture. CISC it's just a term to describe the ISA. The implementation can be very different on the hardware side. Even tho the CISC architectures are in some way RISC inside, that's too much of an oversimplification because every CISC CPU has a microcode. The way the architecture handles that microcode is analogous of a RISC CPU but a RISC core it's a Turing complete multipurpose machine and the microcode on a CISC core it's just like a recipe on how to handle the CISC instruction, it never changes unless you reprogram it.
  • 2
    @RememberMe

    On the GPU side of things, it's true what you said, but I was talking about CPUs. Obviously, you won't put 4000+ big ass CISC cores on the same silicon die. Also, RISC cores were chosen on GPUs because tasks are parallel and usually the task is divided between all cores on the GPU, not loaded on a single core.
  • 1
    @Bybit260 you're right, but that's not the point here. I'm pointing out that a CISC *ISA* does not necessarily equal better performance for high performance stuff.

    For one thing, you need to burn a lot of silicon for an extremely complex decoder stage (because if you go the *naive* microcode way you lose out on OOO so you kinda need to go micro-ops but your decoder now needs to translate ISAs) which is hard to make wide. This is one place where RISC vs CISC really shows, because the stages after that are similar in both. RISC does lose out on memory bw (not true with compressed RISC though) but literally everything else is easier (eg. fast, wide decoders are trivial). And with modern high performance processors having massive memory buses and icaches that can comfortably fit a good amount of RISC code and have hundreds of operations in flight at any given time that's not all that big an issue anymore (this was not true earlier).

    Of course as usual the correct answer to this is "it depends" (on your use case and implementation), but I'd also like to point out that after x86 and VAX there haven't really been any new CISC ISAs. You'd think they'd show up in new high performance designs, but x86 is only like that for compatibility.

    As a related point, RISC ISA has the advantage of making code scheduling and compiling more obvious.

    Finally, modern high performance is about energy efficiency. A blazing fast, white hot core would be pretty useless overall unless the performance advantage over slower cores that sip energy is truly astronomical, and RISC ISAs lend themselves to more energy efficient designs in general (another massive "it depends" here though).

    If anything I think it's not that relevant a difference right now given that a ton of research has gone into CISC-to-RISC dynamic translation (except in efficiency I guess).
  • 2
    @RememberMe You right.

    I designed RISC cores, because everything is so simple to implement.

    Conclusion: Hardware is fucking hard.
  • 1
    @RememberMe I take everything back lol. CISC is a fucking pain to work with. Fuck CISC its too fucking complicated.
  • 1
    @Bybit260 haha! Indeed it is. What are you having trouble with?
  • 1
    @RememberMe Pipelineing the Instruction fetch logic. I don't even think it's possible. It requires a state machine, its slow. Its a pain. All data for an instruction has to be fetched before beginning the next one, so no luck at Superscalar. I cannot even imagine implementing OoO. Everything is a nightmare.
  • 1
    @Bybit260 hmm, is your fetch only a single instruction wide? Could you make it fetch a bigger chunk of memory in a go, and say decode like two or three instrs from there?
    If you're concerned about main memory bandwidth, use an instruction cache here.
    (You can use a combinational read, sequential write block of memory to model a cache as a first approximation).

    Look up the MIPS R10k paper for a nice clean-ish OoO. It's not easy to do even for RISC, and pretty much a no go for CISC.
  • 1
    @RememberMe Bro, thank you. How didn't that went through my mind at first? lol. Also, wonderful Mips10K paper, just what I needed.
Add Comment