Only context-less names like “Kogge-Stone” and unexplained box diagrams Now rename C to Cin, and Carry to Cout, and we have a “full adder” block that. Download scientific diagram | Illustration of a bit Kogge-Stone adder. from publication: FPGA Fault Tolerant Arithmetic Logic: A Case Study Using. adder being analyzed in this paper is the bit Kogge-Stone adder, which is the fastest configuration of the family of carry look-ahead adders . There are.
|Published (Last):||27 May 2015|
|PDF File Size:||11.12 Mb|
|ePub File Size:||17.57 Mb|
|Price:||Free* [*Free Regsitration Required]|
Skip to main content.
Log In Sign Up. Kogge Stone Adder Tutorial.
How do modern computer CPUs add numbers? I took classes on this in school, so I had a basic understanding, but the more I thought about it, the more I realized that my ideas about how this would scale up to bit computers would be too slow to actually work.
I started digging around, and even though wikipedia is usually exhaustive and often inscrutable about obscure topics, I had reached the edge of the internet. I had to do actual research of the 20th-century kind. So come with me over the precipice and learn — in great detail — how to add numbers! Adding in binary For big numbers, addition by hand means starting on the rightmost digit, adding all the digits in the column, and then writing down the units digit and carrying the tens over.
This works the same in binary, but the digits can only ever be 0 or 1, so the biggest number we can add is 1 plus 1. In fact, if we have a carry, 1 plus 1 with a carried 1 is 3: That still only carries a 1, which is convenient, because it means the carry can be represented in binary just like every other digit.
We can make a logic table for this: It gives you a bit more intuition when dealing with logical equations, which will come up later.
One way to think of it is: According to the logic table we just made, the sum should be 1 if there are an odd number of incoming 1s. And the carry should be 1 if at least two of the incoming digits are 1. Adding in circuitry The most straightforward logic circuit for this is assuming you have a 3-input XOR gate. And if we put a bunch of them in a row, we can add any N-bit numbers together! Starting along the top, there are four inputs each of A and B, which allows us to add two 4-bit numbers.
The carry-out from the right-most adder is passed along to the second adder, just like in long addition: Imagine setting up 64 of those adders in a chain, so you could add two bit numbers together.
How long would it take? The circuit diagram above shows that each sum goes through one or two gates, and each carry-out goes through two.
And the carry-out of one adder becomes the carry-in for the next one. Carry-select adder The trick that seems most obvious to me — and the only one I thought of before doing research — was apparently invented in by Sklansky.
One computes the sum with a carry-in of 0, and the other computes with a carry-in of 1. When the real carry-in signal arrives, it selects which addition to use.
A mux takes two inputs and selects one or the other, based on a control signal. In this case, each mux uses the carry-in signal to determine which adder output to use, for each of the four sum bits along the bottomand the carry-out bit on the left.
The diagram gets simpler if we make a shortcut box for a series of connected adder units, and draw each group of 4 input or output bits as a thick gray bus: Now, for example, to compute the sum of two bit numbers, we can split each number into four chunks of four bits each, and let each of these 4-bit chunks add in parallel. Simplifying the diagram a bit more, it looks like: For a bit adder, it would take 24 delays, because it would have 16 muxes instead of 4.
Going from to 24 is a great start, and it only cost us a little less than twice as many gates! We can fuss with this and make it a little faster. If we compute only one bit at a time on the right, then two, then three, and so on as it goes left, we can shave off a few more. But… we can do better.
Next time, some tricker adding methods that end up being quicker. Be sure to read part 1 before diving into this! Both of these cases are the same whether the carry-in is 0 on 1. It will have a carry-out if it generates one, or it propagates one and the lowest bit generated one, or it propagates one and the lowest bit propagates one and the carry-in was 1.
Parallel in small doses Stobe series can go on indefinitely. We could compute each carry bit in 3 gate delays, but to add 64 bits, it would require a pile of mythical input AND and OR gates, and a lot of silicon. That adds one more gate, for a total of 4 gate delays to compute the whole 2-bit sum. If we built a set of 4-bit adders this way — assuming a 6-way OR gate is fine — our carry-select adder could add two bit numbers in 19 gate delays: These ripples now account for almost all of the delay.
Kogge-Stone Inprobably while listening to a Qdder or King Crimson album, Kogge and Stone came up with the idea of parallel-prefix computation. Their paper was a description of how to generalize recursive linear functions into forms that can be quickly combined in an arbitrary order, but um, they were being coy in a way that math people do.
What they were really getting at is that these G and P values can mogge combined before being used.
If you combine two columns together, sone can say that as a whole, they may generate or propagate a carry. If the left one generates, or the left one propagates and the sotne one generates, then the koge two-column unit will generate a carry. The unit will only propagate a carry bit across addee both columns are propagating.
It looks like this: This is the country where cowboys ride horses that go twice as far with each hoofstep. But seriously, it means we can compute the final carry in an 8-bit adder in 3 steps. Well, the numbers at the top represent the computed P and G bit for each of the 8 columns of our 8-bit adder. The diamonds afder two adjacent sets of columns and produce a new combined P and G for the set.
If this works, at the bottom, each arrow should represent the combined P and G for that column and every column to its right. Look at the line on the far left, and trace it back up. It combines the lines from 7 and 3, and as we trace that up again, each of those combines two more units, and then again to cover afder 8 columns.
The same path up should work for each column. These combined P and G values represent the combined value for each set of columns all the way to the right edge, so they can be used to compute the carry-out for each column from the original carry-in bit, instead of rippling: As we saw above, each combining operation is two gates, and computing the original P and G is one more. For a bit adder, we need 6 combining steps, and get our result in 16 gate delays!
The Kogge-Stone adder is the fastest possible layout, because it scales logarithmically. Every time we add a combining step, it doubles the number of bits that can be added. Koggs might even monopolize a lot of the chip space if we tried to build it.
If you walk up the tree from bottom to top on any column, it should still end up combining every other column to its right, but this time it uses far fewer connections to do so.
That is, it can be built easier than the Kogge-Stone adder, even though it has nearly twice as many combination steps in it. This is more than our best-case of 16 for the Kogge-Stone adder, and a bit more than our naive-case of 24 with the carry-select adder. You can see this especially in column 3. That reduces the fan-out back to 2 without slowing anything down.
So if stons were to combine this strategy with the carry-select strategy from last time, our carry bits could start rippling across the adder units before each unit finishes computing the intermediate bits. When a carry-select adder koggs used with k units, the ripple delay is k plus the time it takes to get a carry-out from the first unit.
Kogge–Stone adder – Wikipedia
So if we split our bit adder into 8 8-bit Brent-Kung adders, and combine those into a carry-select adder, the 8-bit adders will compute their carry-out bits in 9 gate delays, after which the carry bits ripple through the muxes for 7 gate delays, for a total of The sum bits are available after 14 gate delays, in plenty of time. So we got it down to 16 total, and this time in a pretty efficient way!
Proof that humans can make anything complicated, if they try hard enough. There are a bunch of other historical strategies, but I thought these were the most interesting and effective. Remember me on this computer. Enter the email address you signed up with and we’ll email you a reset link. Click here to sign up. Help Center Find new research papers in: