FPGA Central - World's 1st FPGA / CPLD Portal

FPGA Central

World's 1st FPGA Portal

 

Go Back   FPGA Groups > NewsGroup > FPGA

FPGA comp.arch.fpga newsgroup (usenet)

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 07-20-2005, 12:52 AM
Brad Smallridge
Guest
 
Posts: n/a
Default Ones Count 64 bit on Xilinx in VHDL

Hello Group,

What is the best way to count 64 incoming simultaneous
bit signals to determine the number of 1s (in VHDL)?
I have clock cycles to spare but the result must be pipelined
so that each clock cycle produces a new count.

Brad Smallridge
b r a d @ a i v i s i o n . c o m



Reply With Quote
  #2 (permalink)  
Old 07-20-2005, 12:58 AM
John_H
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

Add them.

Add registers to your path and make your tool retime them.

This has been covered in the newsgroup in the past. How many levels of
logic you can deal with depends on your device and your clock. Just adding
the individual bits together will produce the desired results and you can
pipeline to your heart's content allowing a new result every clock (after
the initial latency) in the time it takes to run through one carry-chain
adder.


"Brad Smallridge" <[email protected]> wrote in message
news:[email protected]
> Hello Group,
>
> What is the best way to count 64 incoming simultaneous
> bit signals to determine the number of 1s (in VHDL)?
> I have clock cycles to spare but the result must be pipelined
> so that each clock cycle produces a new count.
>
> Brad Smallridge
> b r a d @ a i v i s i o n . c o m



Reply With Quote
  #3 (permalink)  
Old 07-20-2005, 01:12 AM
Brad Smallridge
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

Yeah, I understand this. But I can't wrap my head around how to code it.

Do you do like this:
if( clk'event and clk='1') then
partial_sum1_2bit <= '0'&bit0 + '0'&bit1;
partial_sum2_2bit <= '0'&bit0 + '0'&bit1;
partial_sum1_3bit <= '0'&partial_sum1_2bit + '0'&partial_sum2_2bit;
-- and so on
end if;

And then there is the question on how this all synthesizes, probably, for me
at 27MHz, opimized for area not speed. I could use a little insight from
someone who's done this before.

Brad Smallridge
b r a d @ a i v i s i o n . c o m


Reply With Quote
  #4 (permalink)  
Old 07-20-2005, 01:33 AM
Brad Smallridge
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

I also don't understand what you mean by "having your tool
retime them". I don't have Precision or any advance tools here.


Reply With Quote
  #5 (permalink)  
Old 07-20-2005, 01:37 AM
Peter Alfke
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

Brad,
what you need is called a Wallace-Tree Adder.
Here is a fairly efficient implementation:
Divide your input into groups of 12 bits, and use each as address bits
to a BlockRAM, loaded to output a 4-bit number that describes the
number of 1s in the address.
You can treat each port independently, so one BlockRAM handles 24
inputs and generates two 4-bit outputs. Three BlockRAMs handle 72
incoming bits, and produce six sets of 4-bit values. You can combine
them with five 6-bit adders on 3 levels, giving you a total of 4
pipeline delays.
This is just one of many ways to solve your design problem...
I like to use BlockRAMs for unconventional purposes.
Peter Alfke

Brad Smallridge wrote:
> Yeah, I understand this. But I can't wrap my head around how to code it.
>
> Do you do like this:
> if( clk'event and clk='1') then
> partial_sum1_2bit <= '0'&bit0 + '0'&bit1;
> partial_sum2_2bit <= '0'&bit0 + '0'&bit1;
> partial_sum1_3bit <= '0'&partial_sum1_2bit + '0'&partial_sum2_2bit;
> -- and so on
> end if;
>
> And then there is the question on how this all synthesizes, probably, for me
> at 27MHz, opimized for area not speed. I could use a little insight from
> someone who's done this before.
>
> Brad Smallridge
> b r a d @ a i v i s i o n . c o m


Reply With Quote
  #6 (permalink)  
Old 07-20-2005, 02:05 AM
JJ
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

64 is 0+63+1
63 is 31+31+1
31 is 15+15+1
15 is 7+7+1
7 is 3+3+1
simple recursion

a few adder rows should be pretty quick and way less resources than
BlockRam, takes about 6 levels of small adders

Reply With Quote
  #7 (permalink)  
Old 07-20-2005, 02:05 AM
Peter Alfke
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

In case you are interested in price and performance:
3 BlockRAMs plus 6 CLBs, four levels of pipelining, running at 200 MHz+
Not too bad :-)
Peter Alfke

Reply With Quote
  #8 (permalink)  
Old 07-20-2005, 05:09 AM
Ray Andraka
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

Brad Smallridge wrote:

>Hello Group,
>
>What is the best way to count 64 incoming simultaneous
>bit signals to determine the number of 1s (in VHDL)?
>I have clock cycles to spare but the result must be pipelined
>so that each clock cycle produces a new count.
>
>Brad Smallridge
>b r a d @ a i v i s i o n . c o m
>
>
>
>
>

Brad,
Basically, you want to gather bits together in small adders. a wallace
tree does that using full adders to compress 3 single bit inputs, all
withthe same weight into two signals, a sum and a carry. The sum has
the same weight as the inputs, the carry has weight 2x the input. Then
you use another layer to sum all like weighted bits, and repeat until
you are left with two signals of each weight. You combine those with a
conventional adder.

What Peter described is going to be more clock cycle efficient because
you use the BRAM in place of a wallace tree. His description isn't
really a wallace tree because it doesn't have the same structure (no
tree of carry-save adders, and the final outputs are complete sums of
the bits for those BRAMs, not a carry vector and a sum vector like a
wallace tree). You could use wallace trees to combine the results, from
the BRAMs, but it isn't efficient in an FPGA.





--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email [email protected]
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759


Reply With Quote
  #9 (permalink)  
Old 07-20-2005, 06:54 PM
John_H
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

If I were to do it in Verilog, I might use
always @(posedge Clk27M)
TotalOnes <= in[0]+in[1]+in[2]+in[3]+in[4]+in[5]+in[6]+... and continue
typing until I reach +in[63];

The synthesizer MAY produce superb results.
If it grinds, split it into groups - 8 groups of 8 or 4 groups of 16 nad add
*those* values together as a multiple-value addition.


"Brad Smallridge" <[email protected]> wrote in message
news:[email protected]
> Yeah, I understand this. But I can't wrap my head around how to code it.
>
> Do you do like this:
> if( clk'event and clk='1') then
> partial_sum1_2bit <= '0'&bit0 + '0'&bit1;
> partial_sum2_2bit <= '0'&bit0 + '0'&bit1;
> partial_sum1_3bit <= '0'&partial_sum1_2bit + '0'&partial_sum2_2bit;
> -- and so on
> end if;
>
> And then there is the question on how this all synthesizes, probably, for

me
> at 27MHz, opimized for area not speed. I could use a little insight from
> someone who's done this before.
>
> Brad Smallridge
> b r a d @ a i v i s i o n . c o m



Reply With Quote
  #10 (permalink)  
Old 07-20-2005, 07:30 PM
Vladislav Muravin
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

Brad,

There are so many ways of doing this, depending on your FPGA family,
required timing and the available resouces, but other than using the
"natural resources", simple LUTs, pipelines, even multipliers, etc., you can
also use memories. Personally, I like using memories for state machines,
especially for channelized state machines or LUT for pre-computed CRC
calculation.

If we are talking about Virtex family, we have 16384 bits RAMs, which can be
used as 4096x4 LUT, where you have a '1's counter within 12-bit vector,
which is applied as an address of the entry. Each entry holds the number of
'1's. It is clear how to expand this concept further to any vector,
depending on the timing requirements and the available resources.

One way is that you can try 5 memory blocks like this and it will give you
60 bits covered, then simply add the "data_out"s and the extra bits and
pipeline them.
There could be more "balanced" or optimal usage of memories and FFs.

I hope i did not make any math mistake here.

Vladislav



"Brad Smallridge" <[email protected]> wrote in message
news:[email protected]
> Hello Group,
>
> What is the best way to count 64 incoming simultaneous
> bit signals to determine the number of 1s (in VHDL)?
> I have clock cycles to spare but the result must be pipelined
> so that each clock cycle produces a new count.
>
> Brad Smallridge
> b r a d @ a i v i s i o n . c o m
>
>
>



Reply With Quote
  #11 (permalink)  
Old 07-20-2005, 09:25 PM
Brad Smallridge
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

I would like to switch to Verilog, but not on this project.

> If I were to do it in Verilog, I might use
> always @(posedge Clk27M)
> TotalOnes <= in[0]+in[1]+in[2]+in[3]+in[4]+in[5]+in[6]+... and continue
> typing until I reach +in[63];
>



Reply With Quote
  #12 (permalink)  
Old 07-20-2005, 11:19 PM
Peter Alfke
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

Vladislav, I agreee. And the nicest thing is that you can fold two
BlockRAMs into one, by using the two ports independently. So one
BlockRAM takes care of 24 inputs and generates two sets of 4 bits each.
That means you need only 3 BlockRAMs for up to 72 inputs. (plus a few
CLBs to combine the outputs, unless you want to use two more BlockRAMs
to do that) 5 BlockRAMs total gives a 2-clock latency.
It all depends what you are after, speed or cost.
Peter Alfke

Reply With Quote
  #13 (permalink)  
Old 07-20-2005, 11:37 PM
Ben Twijnstra
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

Hi Peter,

> Vladislav, I agreee. And the nicest thing is that you can fold two
> BlockRAMs into one, by using the two ports independently. So one
> BlockRAM takes care of 24 inputs and generates two sets of 4 bits each.
> That means you need only 3 BlockRAMs for up to 72 inputs. (plus a few
> CLBs to combine the outputs, unless you want to use two more BlockRAMs
> to do that) 5 BlockRAMs total gives a 2-clock latency.
> It all depends what you are after, speed or cost.


I personally feel that using blockrams is a bit wasteful - I coded something
up in VHDL that used 144LEs in an Altera Cyclone 1, slowest speed grade,
running at 115MHz with two clocks of latency as well. No idea how big that
would be in a Spartan - my guess is that it would be similar.

Then again, if there's no LUTs left, and there's some leftover BRAMs, then
sure this is a great solution.

BTW: Peter, would you (plural) mind if I downloaded a WebPack so I can
compare?

Best regards,


Ben

Reply With Quote
  #14 (permalink)  
Old 07-21-2005, 12:12 AM
Ray Andraka
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

Ben Twijnstra wrote:

Ben, you are correct, IF you need the block RAMs elsewhere in your
design, or if they are not located conveniently with respect to the
logic this is related to. Using LUTs, it can be done in 5 layers of
logic, which even without pipelining but with floorplanning will run
pretty quickly. If you pipeline it on every layer, it might even
out-perform the BRAM , but only if you are very careful about the
placement.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email [email protected]
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759


Reply With Quote
  #15 (permalink)  
Old 07-21-2005, 01:00 AM
John_H
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

I wasn't suggesting you should switch to verilog, just the code that I
showed is Verilog but the concept should translate directly yo VHDL. Add 64
1-bit values in a single VHDL line. If the synthesizer doesn't do a good
job, have eight lines of eight values each then add those 8 4-bit results in
one line to get your 7-bit result.

"Brad Smallridge" <[email protected]> wrote in message
news:[email protected]
> I would like to switch to Verilog, but not on this project.
>
> > If I were to do it in Verilog, I might use
> > always @(posedge Clk27M)
> > TotalOnes <= in[0]+in[1]+in[2]+in[3]+in[4]+in[5]+in[6]+... and continue
> > typing until I reach +in[63];
> >

>
>



Reply With Quote
  #16 (permalink)  
Old 07-21-2005, 06:40 AM
JJ
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

I did this with 63 inputs all 32bits wide in a plain virtex 800 many
yrs ago

If you are building a syncronizer for a 64 bit sync field, if you can
cut off 1 bit either the 1st or last and use 63 bits, you can save the
last row of adders. Since mine was 32 wide it save alot more than 6
adders. The 1 bit loss probably wouldn't affect a syncronizer
application.

I wouldn't want to replicate 3 BRAMs 32 times though.

Whats the application?

Reply With Quote
  #17 (permalink)  
Old 07-21-2005, 11:36 PM
JustJohn
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

Brad Smallridge wrote:
> Hello Group,
>
> What is the best way to count 64 incoming simultaneous
> bit signals to determine the number of 1s (in VHDL)?
> I have clock cycles to spare but the result must be pipelined
> so that each clock cycle produces a new count.


In case you haven't found it yet, VHDL code for 30 bits (without
pipelining, that should be easy to add as a bunch of registers at the
end, which XST or Synplify may apply register re-timing to) was posted
at:

http://groups-beta.google.com/group/...80e49da22bae56

Should be straightfoward to extend to 64 bits.

If it's not too much trouble, can I ask what is the application?
I thought the circuit was neat, but wonder how folks use it.

John

Reply With Quote
  #18 (permalink)  
Old 07-22-2005, 01:59 AM
Peter Alfke
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

JJ, I hope you realize that 3 BRAMs is all you need. Nobody would
suggest to replicate them. For what??
Peter Alfke

Reply With Quote
  #19 (permalink)  
Old 07-22-2005, 07:42 AM
Brad Smallridge
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

Well, thanks for all your suggestions. As far as BRAMs, I would
rather use them elsewhere. I ended up with this rather verbose
code shown below. And I don't know how well it synthesizes, probably not
to well, because I think it is using several hundred LUTs. It's actually
a 62 ones counter and the bits can be turned off from the center out
with the B signals.

Brad

signal B00 : std_logic;
signal B01 : std_logic;
signal B02 : std_logic;
signal B03 : std_logic;
signal B04 : std_logic;
signal B05 : std_logic;
signal B06 : std_logic;
signal B07 : std_logic;
signal B08 : std_logic;
signal B09 : std_logic;
signal B10 : std_logic;
signal B11 : std_logic;
signal B12 : std_logic;
signal B13 : std_logic;
signal B14 : std_logic;
signal B15 : std_logic; -- center
signal B16 : std_logic;
signal B17 : std_logic;
signal B18 : std_logic;
signal B19 : std_logic;
signal B20 : std_logic;
signal B21 : std_logic;
signal B22 : std_logic;
signal B23 : std_logic;
signal B24 : std_logic;
signal B25 : std_logic;
signal B26 : std_logic;
signal B27 : std_logic;
signal B28 : std_logic;
signal B29 : std_logic;
signal B30 : std_logic;

signal EL00 : std_logic;
signal EL01 : std_logic;
signal EL02 : std_logic;
signal EL03 : std_logic;
signal EL04 : std_logic;
signal EL05 : std_logic;
signal EL06 : std_logic;
signal EL07 : std_logic;
signal EL08 : std_logic;
signal EL09 : std_logic;
signal EL10 : std_logic;
signal EL11 : std_logic;
signal EL12 : std_logic;
signal EL13 : std_logic;
signal EL14 : std_logic;
signal EL15 : std_logic;
signal EL16 : std_logic;
signal EL17 : std_logic;
signal EL18 : std_logic;
signal EL19 : std_logic;
signal EL20 : std_logic;
signal EL21 : std_logic;
signal EL22 : std_logic;
signal EL23 : std_logic;
signal EL24 : std_logic;
signal EL25 : std_logic;
signal EL26 : std_logic;
signal EL27 : std_logic;
signal EL28 : std_logic;
signal EL29 : std_logic;
signal EL30 : std_logic;

signal ER00 : std_logic;
signal ER01 : std_logic;
signal ER02 : std_logic;
signal ER03 : std_logic;
signal ER04 : std_logic;
signal ER05 : std_logic;
signal ER06 : std_logic;
signal ER07 : std_logic;
signal ER08 : std_logic;
signal ER09 : std_logic;
signal ER10 : std_logic;
signal ER11 : std_logic;
signal ER12 : std_logic;
signal ER13 : std_logic;
signal ER14 : std_logic;
signal ER15 : std_logic;
signal ER16 : std_logic;
signal ER17 : std_logic;
signal ER18 : std_logic;
signal ER19 : std_logic;
signal ER20 : std_logic;
signal ER21 : std_logic;
signal ER22 : std_logic;
signal ER23 : std_logic;
signal ER24 : std_logic;
signal ER25 : std_logic;
signal ER26 : std_logic;
signal ER27 : std_logic;
signal ER28 : std_logic;
signal ER29 : std_logic;
signal ER30 : std_logic;

signal sum_2_00 : std_logic_vector(1 downto 0);
signal sum_2_01 : std_logic_vector(1 downto 0);
signal sum_2_02 : std_logic_vector(1 downto 0);
signal sum_2_03 : std_logic_vector(1 downto 0);
signal sum_2_04 : std_logic_vector(1 downto 0);
signal sum_2_05 : std_logic_vector(1 downto 0);
signal sum_2_06 : std_logic_vector(1 downto 0);
signal sum_2_07 : std_logic_vector(1 downto 0);
signal sum_2_08 : std_logic_vector(1 downto 0);
signal sum_2_09 : std_logic_vector(1 downto 0);
signal sum_2_10 : std_logic_vector(1 downto 0);
signal sum_2_11 : std_logic_vector(1 downto 0);
signal sum_2_12 : std_logic_vector(1 downto 0);
signal sum_2_13 : std_logic_vector(1 downto 0);
signal sum_2_14 : std_logic_vector(1 downto 0);
signal sum_2_15 : std_logic_vector(1 downto 0);
signal sum_2_16 : std_logic_vector(1 downto 0);
signal sum_2_17 : std_logic_vector(1 downto 0);
signal sum_2_18 : std_logic_vector(1 downto 0);
signal sum_2_19 : std_logic_vector(1 downto 0);
signal sum_2_20 : std_logic_vector(1 downto 0);
signal sum_2_21 : std_logic_vector(1 downto 0);
signal sum_2_22 : std_logic_vector(1 downto 0);
signal sum_2_23 : std_logic_vector(1 downto 0);
signal sum_2_24 : std_logic_vector(1 downto 0);
signal sum_2_25 : std_logic_vector(1 downto 0);
signal sum_2_26 : std_logic_vector(1 downto 0);
signal sum_2_27 : std_logic_vector(1 downto 0);
signal sum_2_28 : std_logic_vector(1 downto 0);
signal sum_2_29 : std_logic_vector(1 downto 0);
signal sum_2_30 : std_logic_vector(1 downto 0);

signal sum_3_0 : std_logic_vector(2 downto 0);
signal sum_3_1 : std_logic_vector(2 downto 0);
signal sum_3_2 : std_logic_vector(2 downto 0);
signal sum_3_3 : std_logic_vector(2 downto 0);
signal sum_3_4 : std_logic_vector(2 downto 0);
signal sum_3_5 : std_logic_vector(2 downto 0);
signal sum_3_6 : std_logic_vector(2 downto 0);
signal sum_3_7 : std_logic_vector(2 downto 0);
signal sum_3_8 : std_logic_vector(2 downto 0);
signal sum_3_9 : std_logic_vector(2 downto 0);
signal sum_3_10 : std_logic_vector(2 downto 0);
signal sum_3_11 : std_logic_vector(2 downto 0);
signal sum_3_12 : std_logic_vector(2 downto 0);
signal sum_3_13 : std_logic_vector(2 downto 0);
signal sum_3_14 : std_logic_vector(2 downto 0);
signal sum_3_15 : std_logic_vector(2 downto 0);

signal sum_4_0 : std_logic_vector(3 downto 0);
signal sum_4_1 : std_logic_vector(3 downto 0);
signal sum_4_2 : std_logic_vector(3 downto 0);
signal sum_4_3 : std_logic_vector(3 downto 0);
signal sum_4_4 : std_logic_vector(3 downto 0);
signal sum_4_5 : std_logic_vector(3 downto 0);
signal sum_4_6 : std_logic_vector(3 downto 0);
signal sum_4_7 : std_logic_vector(3 downto 0);

signal sum_5_0 : std_logic_vector(4 downto 0);
signal sum_5_1 : std_logic_vector(4 downto 0);
signal sum_5_2 : std_logic_vector(4 downto 0);
signal sum_5_3 : std_logic_vector(4 downto 0);

signal sum_6_0 : std_logic_vector(5 downto 0);
signal sum_6_1 : std_logic_vector(5 downto 0);

signal sum_7_0 : std_logic_vector(6 downto 0);


begin

s15rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_15 <= "00";
if( B15='1') then
if( EL15='1' and ER15='1') then
sum_2_15 <= "10";
elsif( EL15='1' or ER15='1') then
sum_2_15 <= "01";
end if;
end if;
end if;
end process;

s1416rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_14 <= "00";
sum_2_16 <= "00";
if( B15='1' and B14='1' and B16='1' ) then
if( EL14='1' and ER14='1') then
sum_2_14 <= "10";
elsif( EL14='1' or ER14='1') then
sum_2_14 <= "01";
end if;
if( EL16='1' and ER16='1') then
sum_2_16 <= "10";
elsif( EL16='1' or ER16='1') then
sum_2_16 <= "01";
end if;
end if;
end if;
end process;

s1317rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_13 <= "00";
sum_2_17 <= "00";
if( B15='1' and B16='1' and B17='1'
and B14='1' and B13='1') then
if( EL13='1' and ER13='1') then
sum_2_13 <= "10";
elsif( EL13='1' or ER13='1') then
sum_2_13 <= "01";
end if;
if( EL17='1' and ER17='1') then
sum_2_17 <= "10";
elsif( EL17='1' or ER17='1') then
sum_2_17 <= "01";
end if;
end if;
end if;
end process;

s1218rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_12 <= "00";
sum_2_18 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1'
and B14='1' and B13='1' and B12='1' ) then
if( EL12='1' and ER12='1') then
sum_2_12 <= "10";
elsif( EL12='1' or ER12='1') then
sum_2_12 <= "01";
end if;
if( EL18='1' and ER18='1') then
sum_2_18 <= "10";
elsif( EL18='1' or ER18='1') then
sum_2_18 <= "01";
end if;
end if;
end if;
end process;

s1119rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_11 <= "00";
sum_2_19 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1' and B19='1'
and B14='1' and B13='1' and B12='1' and B11='1') then
if( EL11='1' and ER11='1') then
sum_2_11 <= "10";
elsif( EL11='1' or ER11='1') then
sum_2_11 <= "01";
end if;
if( EL19='1' and ER19='1') then
sum_2_19 <= "10";
elsif( EL19='1' or ER19='1') then
sum_2_19 <= "01";
end if;
end if;
end if;
end process;

s1020rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_10 <= "00";
sum_2_20 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1' and B19='1'
and B20='1'
and B14='1' and B13='1' and B12='1' and B11='1'
and B10='1') then
if( EL10='1' and ER10='1') then
sum_2_10 <= "10";
elsif( EL10='1' or ER10='1') then
sum_2_10 <= "01";
end if;
if( EL20='1' and ER20='1') then
sum_2_20 <= "10";
elsif( EL20='1' or ER20='1') then
sum_2_20 <= "01";
end if;
end if;
end if;
end process;

s0921rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_09 <= "00";
sum_2_21 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1' and B19='1'
and B20='1' and B21='1'
and B14='1' and B13='1' and B12='1' and B11='1'
and B10='1' and B09='1'
) then
if( EL09='1' and ER09='1') then
sum_2_09 <= "10";
elsif( EL09='1' or ER09='1') then
sum_2_09 <= "01";
end if;
if( EL21='1' and ER21='1') then
sum_2_21 <= "10";
elsif( EL21='1' or ER21='1') then
sum_2_21 <= "01";
end if;
end if;
end if;
end process;

s0822rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_08 <= "00";
sum_2_22 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1' and B19='1'
and B20='1' and B21='1' and B22='1'
and B14='1' and B13='1' and B12='1' and B11='1'
and B10='1' and B09='1' and B08='1'
) then
if( EL08='1' and ER08='1') then
sum_2_08 <= "10";
elsif( EL08='1' or ER08='1') then
sum_2_08 <= "01";
end if;
if( EL22='1' and ER22='1') then
sum_2_22 <= "10";
elsif( EL22='1' or ER22='1') then
sum_2_22 <= "01";
end if;
end if;
end if;
end process;

s0723rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_07 <= "00";
sum_2_23 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1' and B19='1'
and B20='1' and B21='1' and B22='1' and B23='1'
and B14='1' and B13='1' and B12='1' and B11='1'
and B10='1' and B09='1' and B08='1' and B07='1'
) then
if( EL07='1' and ER07='1') then
sum_2_07 <= "10";
elsif( EL07='1' or ER07='1') then
sum_2_07 <= "01";
end if;
if( EL23='1' and ER23='1') then
sum_2_23 <= "10";
elsif( EL23='1' or ER23='1') then
sum_2_23 <= "01";
end if;
end if;
end if;
end process;

s0624rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_06 <= "00";
sum_2_24 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1' and B19='1'
and B20='1' and B21='1' and B22='1' and B23='1'
and B24='1'
and B14='1' and B13='1' and B12='1' and B11='1'
and B10='1' and B09='1' and B08='1' and B07='1'
and B06='1'
) then
if( EL06='1' and ER06='1') then
sum_2_06 <= "10";
elsif( EL06='1' or ER06='1') then
sum_2_06 <= "01";
end if;
if( EL24='1' and ER24='1') then
sum_2_24 <= "10";
elsif( EL24='1' or ER24='1') then
sum_2_24 <= "01";
end if;
end if;
end if;
end process;

s0525rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_05 <= "00";
sum_2_25 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1' and B19='1'
and B20='1' and B21='1' and B22='1' and B23='1'
and B24='1' and B25='1'
and B14='1' and B13='1' and B12='1' and B11='1'
and B10='1' and B09='1' and B08='1' and B07='1'
and B06='1' and B05='1'
) then
if( EL05='1' and ER05='1') then
sum_2_05 <= "10";
elsif( EL05='1' or ER05='1') then
sum_2_05 <= "01";
end if;
if( EL25='1' and ER25='1') then
sum_2_25 <= "10";
elsif( EL25='1' or ER25='1') then
sum_2_25 <= "01";
end if;
end if;
end if;
end process;

s0426rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_04 <= "00";
sum_2_26 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1' and B19='1'
and B20='1' and B21='1' and B22='1' and B23='1'
and B24='1' and B25='1' and B26='1'
and B14='1' and B13='1' and B12='1' and B11='1'
and B10='1' and B09='1' and B08='1' and B07='1'
and B06='1' and B05='1' and B04='1'
) then
if( EL04='1' and ER04='1') then
sum_2_04 <= "10";
elsif( EL04='1' or ER04='1') then
sum_2_04 <= "01";
end if;
if( EL26='1' and ER26='1') then
sum_2_26 <= "10";
elsif( EL26='1' or ER26='1') then
sum_2_26 <= "01";
end if;
end if;
end if;
end process;

s0327rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_03 <= "00";
sum_2_27 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1' and B19='1'
and B20='1' and B21='1' and B22='1' and B23='1'
and B24='1' and B25='1' and B26='1' and B27='1'
and B14='1' and B13='1' and B12='1' and B11='1'
and B10='1' and B09='1' and B08='1' and B07='1'
and B06='1' and B05='1' and B04='1' and B03='1'
) then
if( EL03='1' and ER03='1') then
sum_2_03 <= "10";
elsif( EL03='1' or ER03='1') then
sum_2_03 <= "01";
end if;
if( EL27='1' and ER27='1') then
sum_2_27 <= "10";
elsif( EL27='1' or ER27='1') then
sum_2_27 <= "01";
end if;
end if;
end if;
end process;

s0228rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_02 <= "00";
sum_2_28 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1' and B19='1'
and B20='1' and B21='1' and B22='1' and B23='1'
and B24='1' and B25='1' and B26='1' and B27='1'
and B28='1'
and B14='1' and B13='1' and B12='1' and B11='1'
and B10='1' and B09='1' and B08='1' and B07='1'
and B06='1' and B05='1' and B04='1' and B03='1'
and B02='1'
) then
if( EL02='1' and ER02='1') then
sum_2_02 <= "10";
elsif( EL02='1' or ER02='1') then
sum_2_02 <= "01";
end if;
if( EL28='1' and ER28='1') then
sum_2_28 <= "10";
elsif( EL28='1' or ER28='1') then
sum_2_28 <= "01";
end if;
end if;
end if;
end process;

s0129rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_01 <= "00";
sum_2_29 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1' and B19='1'
and B20='1' and B21='1' and B22='1' and B23='1'
and B24='1' and B25='1' and B26='1' and B27='1'
and B28='1' and B29='1'
and B14='1' and B13='1' and B12='1' and B11='1'
and B10='1' and B09='1' and B08='1' and B07='1'
and B06='1' and B05='1' and B04='1' and B03='1'
and B02='1' and B01='1'
) then
if( EL01='1' and ER01='1') then
sum_2_01 <= "10";
elsif( EL01='1' or ER01='1') then
sum_2_01 <= "01";
end if;
if( EL29='1' and ER29='1') then
sum_2_29 <= "10";
elsif( EL29='1' or ER29='1') then
sum_2_29 <= "01";
end if;
end if;
end if;
end process;

s0030rocess(clk)
begin
if(clk'event and clk='1') then
sum_2_00 <= "00";
sum_2_30 <= "00";
if( B15='1' and B16='1' and B17='1' and B18='1' and B19='1'
and B20='1' and B21='1' and B22='1' and B23='1'
and B24='1' and B25='1' and B26='1' and B27='1'
and B28='1' and B29='1' and B30='1'
and B14='1' and B13='1' and B12='1' and B11='1'
and B10='1' and B09='1' and B08='1' and B07='1'
and B06='1' and B05='1' and B04='1' and B03='1'
and B02='1' and B01='1' and B00='1'
) then
if( EL00='1' and ER00='1') then
sum_2_00 <= "10";
elsif( EL00='1' or ER00='1') then
sum_2_00 <= "01";
end if;
if( EL30='1' and ER30='1') then
sum_2_30 <= "10";
elsif( EL30='1' or ER30='1') then
sum_2_30 <= "01";
end if;
end if;
end if;
end process;


-- I numberered the partial sums from the center out
-- in case there may be some future additional logic
-- that could share these partial sums.

s3rocess(clk)
begin
if(clk'event and clk='1') then
sum_3_0 <= '0'&sum_2_15;
sum_3_1 <= ('0'&sum_2_14) + ('0'&sum_2_16);
sum_3_2 <= ('0'&sum_2_13) + ('0'&sum_2_17);
sum_3_3 <= ('0'&sum_2_12) + ('0'&sum_2_18);
sum_3_4 <= ('0'&sum_2_11) + ('0'&sum_2_19);
sum_3_5 <= ('0'&sum_2_10) + ('0'&sum_2_20);
sum_3_6 <= ('0'&sum_2_09) + ('0'&sum_2_21);
sum_3_7 <= ('0'&sum_2_08) + ('0'&sum_2_22);
sum_3_8 <= ('0'&sum_2_07) + ('0'&sum_2_23);
sum_3_9 <= ('0'&sum_2_06) + ('0'&sum_2_24);
sum_3_10 <= ('0'&sum_2_05) + ('0'&sum_2_25);
sum_3_11 <= ('0'&sum_2_04) + ('0'&sum_2_26);
sum_3_12 <= ('0'&sum_2_03) + ('0'&sum_2_27);
sum_3_13 <= ('0'&sum_2_02) + ('0'&sum_2_28);
sum_3_14 <= ('0'&sum_2_01) + ('0'&sum_2_29);
sum_3_15 <= ('0'&sum_2_00) + ('0'&sum_2_30);
end if;
end process;

s4rocess(clk)
begin
if(clk'event and clk='1') then
sum_4_0 <= ('0'&sum_3_0) + ('0'&sum_3_1);
sum_4_1 <= ('0'&sum_3_2) + ('0'&sum_3_3);
sum_4_2 <= ('0'&sum_3_4) + ('0'&sum_3_5);
sum_4_3 <= ('0'&sum_3_6) + ('0'&sum_3_7);
sum_4_4 <= ('0'&sum_3_8) + ('0'&sum_3_9);
sum_4_5 <= ('0'&sum_3_10) + ('0'&sum_3_11);
sum_4_6 <= ('0'&sum_3_12) + ('0'&sum_3_13);
sum_4_7 <= ('0'&sum_3_14) + ('0'&sum_3_15);
end if;
end process;

s5rocess(clk)
begin
if(clk'event and clk='1') then
sum_5_0 <= ('0'&sum_4_0) + ('0'&sum_4_1);
sum_5_1 <= ('0'&sum_4_2) + ('0'&sum_4_3);
sum_5_2 <= ('0'&sum_4_4) + ('0'&sum_4_5);
sum_5_3 <= ('0'&sum_4_6) + ('0'&sum_4_7);
end if;
end process;

s6rocess(clk)
begin
if(clk'event and clk='1') then
sum_6_0 <= ('0'&sum_5_0) + ('0'&sum_5_1);
sum_6_1 <= ('0'&sum_5_2) + ('0'&sum_5_3);
end if;
end process;

s7rocess(clk)
begin
if(clk'event and clk='1') then
sum_7_0 <= ('0'&sum_6_0) + ('0'&sum_6_1);
end if;
end process;


Reply With Quote
  #20 (permalink)  
Old 07-22-2005, 09:47 AM
JJ
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

The application involved an 32x oversampled 28MHz PSK streamm from a
powerline.
The logic ran at 28MHz behind a 32tap analog DLL so syncing was done by
looking fow a correlation at each of 32 phases in parallel. With even
minor power line filtering, the bit edges are all over the place making
it tough to say where bits start or end. Anyway it was derived from an
ASIC design and BRams were not plentifull in those early Virtex.

If I did it today, I'd probably use N x faster clock on digital logic
with N x less HW and factor the N out of the oversampling front end
logic.

I still wouldn't use BRAMs today, I'd use them for other functions.
Using 63bits rather than 64 bits takes precisely 63 adder cells and
each doubling adds 2 adder delays (ASIC that is), and 64 takes an extra
6 on top. When did adders become expensive.

johnjakson at usa dot com

Reply With Quote
  #21 (permalink)  
Old 07-22-2005, 10:50 AM
JJ
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

< Using 63bits rather than 64 bits takes precisely 63 adder cells>

Should say
Using 63bits rather than 64 bits takes precisely 63-6 adder cells and
64 bits take 63.

Its been awhile.
You probably didn't realize I was using 32x for oversampling

Reply With Quote
  #22 (permalink)  
Old 08-25-2005, 07:45 PM
glen herrmannsfeldt
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

Ray Andraka wrote:

(snip)

> What Peter described is going to be more clock cycle efficient because
> you use the BRAM in place of a wallace tree. His description isn't
> really a wallace tree because it doesn't have the same structure (no
> tree of carry-save adders, and the final outputs are complete sums of
> the bits for those BRAMs, not a carry vector and a sum vector like a
> wallace tree). You could use wallace trees to combine the results, from
> the BRAMs, but it isn't efficient in an FPGA.


Well, I might describe Peter's as using 12 bit wide (or is it deep)
carry save adders. I forget by now if he actually made a tree
out of it, though.

-- glen

Reply With Quote
  #23 (permalink)  
Old 08-26-2005, 12:57 AM
Paul Marciano
Guest
 
Posts: n/a
Default Re: Ones Count 64 bit on Xilinx in VHDL

Here's a pipeline in Verilog - the theory should translate easily to
VHDL. I'm a beginner at Verilog, so forgive me if there's something
obviously wrong here.


module pop64(
input clk,
input [63:0] vec,
output reg [6:0] sum);

reg [63:0] s1, s2, s3, s4, s5;

always @(posedge clk)
begin
s1 <= (vec & 64'h5555555555555555) + ({ 1'b0, vec[63:1]
} & 64'h5555555555555555);
s2 <= (s1 & 64'h3333333333333333) + ({ 2'b0, s1[63:2]
} & 64'h3333333333333333);
s3 <= (s2 & 64'h0707070707070707) + ({ 4'h0, s2[63:4]
} & 64'h0707070707070707);
s4 <= (s3 & 64'h000f000f000f000f) + ({ 8'h0, s3[63:8]
} & 64'h000f000f000f000f);
s5 <= (s4 & 64'h0000001f0000001f) + ({ 16'h0,
s4[63:16] } & 64'h0000001f0000001f);

sum <= s5[37:32] + s5[5:0];
end

endmodule


This adds pairs of bits, then adds the sums.. then those sums, etc:

// vec bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb bbbbbbbbbbbbbb
// =>
// s1 aabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaabbaa bbaabbaabbaabb
// =>
// s2 .aaa.bbb.aaa.bbb.aaa.bbb.aaa.bbb.aaa.bbb.aaa.bbb.a aa.bbb.aaa.bbb
// =>
// s3 ....aaaa....bbbb....aaaa....bbbb....aaaa....bbbb.. ..aaaa....bbbb
// =>
// s4 ...........aaaaa...........bbbbb...........aaaaa.. .........bbbbb
// =>
// s5 ..........................aaaaaa.................. ........bbbbbb
// =>
// sum aaaaaaa


Comments welcome.

Paul.

Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
VHDL count error when cascading rob Verilog 0 10-07-2006 04:48 PM
How to count of the assertion failures in SV? Ravi S Gowda Verilog 1 12-23-2004 08:37 AM
Leading Zero count DaveW Verilog 11 02-27-2004 01:04 AM
Transistor count Arnaldo Oliveira FPGA 10 09-23-2003 04:36 PM


All times are GMT +1. The time now is 01:14 PM.


Powered by vBulletin® Version 3.8.0
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0
Copyright 2008 @ FPGA Central. All rights reserved