Misc notes pertaining to further GB firmware development

Peter Wittich has set up a CVS repository, which can also be used from a Windows client

From this CVS repository, you can 'cvs checkout ghostbuster'. Then you will see directories 'gb', 'gb0', 'utility', and 'processor'.

I don't currently remember what 'gb' is. 'gb0' is where I was most recently working. 'utility' is where future development on the 'utility' features of the GB could happen. 'processor' is where future development on the 'track processing' features of the GB could happen.

Currently in the 'gb0' area there are several different versions of 'gb.tdf'. 'gb_dupfilt.tdf' is a version of firmware that is currently running in SVT to do the 'ghost busting' in SVT. 'gb_trackfilter.tdf' is a version of firmware that does beam subtraction (and also is smart enough to implement a simple two-track trigger algorithm--which is no longer needed). 'gb_october.tdf' is the version of 'gb_trackfilter.tdf' that we used in the September/October 2001 'SVT special runs'.

Currently in b0svt06 there is a GhostBuster of which three separate channels do three separate tasks, two related to 'track processing' and one 'utility' function. One channel subtracts the beam (and executes some no-longer-needed two-track evaluation). One channel picks only the 'best' SVT track for each XFT track. And one channel allows Marco's program to histogram the SVT processing time event-by-event.

Currently in b0svt07 there is a GhostBuster in which all three channels are programmed to do the 'utility' function. To date, this has been used to read out (into the event record) the output for a few Hit Finder boards that had a high level of SVT/svtsim discrepancy, to help us to narrow down the problem.

Longer term, it would be nice for all GB boards (we built 4 of them) to have the same set of firmware. Probably one channel should be dedicated to 'track processing' and two channels could be dedicated to 'utility' functions. Then for example the first channel could do beam subtraction, ghost removal, and tan(phi) mapping; the second channel could do SVTD bank readout and SVT timing measurement; and the third channel could be available for diagnostic readout by experts, e.g. to debug a data path that disagrees with the simulation.

The Altera APEX 20K200BC356-1XV chip that is the 'brain' of each GB channel (there are 3 channels per board) has about 13KB of internal dual-ported SRAM that can be arranged in a wide variety of widths and depths. Right now the beam subtraction is done with a big lookup table that holds 10 bits of data times 3 bits of z and 10 bits of phi, or about 10KB of data. If instead you use the Flash RAM to store a sine/cosine table and use the Altera logic for multiply/add, you can eliminate this memory usage. The tan(phi) map should also fit easily in the Flash RAM. The Flash RAM has 16 bits of data by 21 address bits, or 2 MB of space. The full (z,phi) data make only 16 bits of address, leaving room for 32 banks of data. So you can easily have a bank for sine, cosine, atan, and whatever else seems useful.

The current Ghost Busting algorithm uses about 6 KB of RAM (or about half). There are 288 possible XFT track numbers (linker chip ID). Note that 288 is just 12.5% larger than a power of two, so using a 9-bit address (with one address per track) would waste 44% of the GB RAM. Note that 288*7 is 98.4% of a power of two, so by using an 11-bit address (with one address per word per track--an SVT track has 7 words), one makes optimal use of the internal RAM and has 7 KB left over for other possible applications. (For example, you could consider implementing a 2K word spy buffer in the 'processor' chip if you are feeling ambitious; or some of the spare RAM could be used for Mel's mapping function from (hit quality, chisq) to rank for duplicate removal.)

Probably one of the most important things to do is to finalize the VME address map on the GB board, make sure there is never logic contention between the three processor chips for the data bus, and make sure the svtvme library (maintained mostly by Stefano Belforte) is aware of the GB address space. This will be much easier to manage if you adopt a convention that e.g. the top channel on each board is always the 'processor' channel.

Here was a plan for crate 6/7 cleanup that was intimately connected to the plan for further GB firmware development. One needs to decide at this point how much diagnostic functionality really is needed.

2002/07/08 plan for GB development

The first step is compile the GB design that is currently in the 'processor' directory. Then you need to learn how to send fake data through the (a) the virtual board in the Altera simulator, (b) the virtual board in the Mentor Graphics simulator, and (c) the real board in the test stand.

(a) is the easiest and is also where you will do 90% of the work. But it does not properly model the interaction between the main Altera chip and the FRAM, the input FIFO, or the VME interface. So I would start with (a), using a reasonable guess for the FRAM behavior. The FIFO behavior is easy to model adequately, and I think I already have an example that sends some tracks into the chip. I also have an example of VME I/O.

(b) is important for checking that the FRAM I/O works. Probably you do not need it for anything else.

(c) is the way you test out the real board with a large quantity of fake data, comparing the output of the board against the output of a computer program that predicts the board's correct output. This is actually much, much easier than (a) or (b), but you learn much less from the real board than you do from the simulator when you're trying to figure out whether or not your firmware program is correct.

Currently there is code in svtsim (svtsim_gb.c) that models the GB's duplicate track filtering. It should be very easy for us to write a short C program to generate some pseudo-random input tracks, run them through the emulator program, and write out the expected output tracks. Then you can compare this with the performance of the Altera simulation.

The first modification should be to delete a bunch of the junk code from the existing gb.tdf (I'll help you decide what's junk) and then make sure the simulation still works.

The next step is to teach svtsim_gb.c how to do the tan(phi) mapping. Once you are happy with that, you need to teach the firmware how to do this mapping. Once you think that works, you can test the Altera simulation against the C program.

The tan(phi) mapping in full generality would require a 13-bit to 13-bit map. But it is a small correction, and the difference between input phi and output phi fits in (3+sign) bits. So you need a 13-bit to 4-bit map, and a 13-bit add/subtract unit.

As each track comes in, you need to latch its phi value. This is done by noticing (at stage B) that wca[]==0, just as the XFT ID is latched by noticing the wca[]==6. This gives you the address for the FRAM. Then, by stage F or G, you should be able to latch in the answer from the FRAM. Then you can apply the correction. Currently the data are written to the "best track" RAM at stage K, so you have 4 clock cycles to add the correction to phi--should be easy.

You probably want to use one of the spare "control" bits to enable/disable the tan(phi) correction. You would always compute the correction, but if the flag is set, you do not overwrite the value with the corrected value. This makes it simple to do a before/after check that this one feature was implemented properly.

Once this works in the Altera simulation, we can try steps (b) and (c). The nice thing about (c) is that you can run a huge amount of data overnight through the real board and check it with a program, so it is a more exhaustive test than you can do with the simulator.

Once this works, we can implement the new beam subtraction algorithm. You had a very good idea to use the other FRAMs on the board (since there are three) to give effectively a wider data bus. Since there are some bussed lines connecting the Altera chips to each other, this should be easy to implement. Let's try it. Then the latency for calculating the corrections for each track should be less than the 7 clock cycles it takes to read in the track, whereas using a single FRAM would require at least 2*4=8 clock cycles of latency.

Next: I need to describe current beam-subtraction procedure and what is the desired new procedure, in detail.

One thing that came up at the SVT workshop is valid vs invalid SVT beam spot. It would be useful for SVDD to contain a "beam fit valid" word. Then run sections could be marked bad if the beam fit is not valid. This would avoid the problem of tossing out a whole run because SVT didn't know the beam position until a few minutes into the run.

Running Quicksim: Here is how I do it. You can probably just change the references to my account into references to your account. Log into edg.uchicago.edu. Edit qsim_gb.py to make whatever changes you wish. Then run it with 'python qsim_gb.py gb'. It will write qsim_gb.ff. I think this only works on edg, because edg is a Linux machine, and I know that Python is installed on all Red Hat Linux machines. Then log into frodo.uchicago.edu, cd /designs/CDF_SVT, and execute the script 'qgb'. It refers to qsim_gb.ff in my account, so you may want to change that. It will start Quicksim and attempt to run it using qsim_gb.ff as the source of waveform data. If you re-run the Python program, you can re-run the simulation without quitting by typing 'dof /net/user/users/ashmansk/qsim_gb.ff' to QuickSim. (Type this with the input focus in the main QuickSim window.)

Downloading firmware:
go to sport
cdf/......
open maxplus 10.0
choose max+plus/programmer
(repeat the following two steps when you recompile program)
choose JTAG/restore jcf/'e:\users\cdf\gb_taka.jcf'
choose configure
(do the following to copy the program onto sport)
open winscp
host=impulse
local directory=e:\users\cdf
remote directory=ghostbuster/processor
copy 'gb.sof' from remote to local
then redo 'restore jcf' and 'configure' above