INTRODUCTION
The performance of the chip, in synchronous blocks, is directly proportional to the frequency of the clock signals as it is the most active and longest network. The maximum clock frequency on which a chip operates is determined by the longest path the clock net takes between the clock source and its terminating sink. This is called the critical path of the clock net and it has to be routed with great precision. Several factors such as the capacitance and resistance of the metal masks, the type is sink that is driven by the clock nets, cross-talk and noise between the nets are to be considered to design a clock distribution network. The difference in the arrival of the clock signals at different sinks, which is defined as the clock skew, must also be considered. In addition to these, placement of the buffers to reduce the skew between the signals and to retain a distortion-less clock signal must also be considered in the clock distribution problem. This necessitates different buffer sizes, which adds to the total power consumption of the design and the total area occupied by the clock distribution network on the chip with an increased transistor counts. Manual routing of the clock nets would seem to be an exhaustive task keeping the design constraints and the tape out time in mind. Hence various algorithms have been developed to tackle the above problems and to synthesize the clock distribution network.
The distribution of the clock network can be done using two structures namely clock tree and clock mesh.
Clock tree – This is organized in a similar fashion as that of a tree, branching from the clock source to various terminating points of the design which form the clock sinks. When we examine the clock tree structure, it can be seen that a common path exist only near the buffers or on the path of the clock source. Hence path sharing of the sinks while tracing back to the clock source is very less. The buffers are placed in every launch nodes, branching from the main path as the tree goes lower towards the sink. Hence the clock tree has great depth for buffer placement. The buffer placement depth increases the path sharing between the clock sources and the sinks reduces. With the increase in the path sharing of the sinks to the source there is reduction in the on chip variations on the distribution as there is no effect of skew between different sinks on the same path due to process variations. Clock tree is adversely affected by this as the number of sinks having a common path to the source is very low.
Clock Mesh- The structure of the clock mesh is in the form of a grid which maybe organized in either a uniform or a non-uniform fashion. A window which is formed from the interconnecting grid contains clock sinks which may be grouped based on various factors such as capacitance, resistance or both. The non-uniformity is due to the uneven grouping of the sinks in the window based on a constraint. The buffers are placed at the nodes of the clock mesh, which are distributed based on the density of the sinks in the window. As it forms a grid structure the path sharing of the sinks to the source is more as a node is connected to every other node. Thus the buffer depth is very shallow, maybe a single stage. Hence there is very less induction of skew due to the process variations.
The clock tree consumes less power when compared to the cock mesh as there is increase in the wire length and hence total wire capacitance of the mesh which amounts for higher power consumption. But as the technology node reduces and the logic becomes denser, the skew due to the process variations becomes more predominant in clock tree rather than in clock mesh. Hence in deep sub-micron technology clock mesh is preferred.
In this work we study the following aspects of the clock mesh synthesis for deep sub micron technologies-
The formation of a non-uniform clock mesh with respect to a given target capacitance of the clock sinks inside every window of the mesh.
Connection of the clock sinks with the nearest edge of the clock mesh window using stubs and calculating the wire total length for different target capacitances.
Placement of the buffer of different sizes with respect to the total sinks capacitance in a given window.
Varying the voltage and the frequency probed to the mesh and calculating the total skew in the mesh.
The rest of the paper is sectioned as follows:-
The first section will contain the algorithm to form the non-uniform mesh and the connection of the clock sinks using stubs
The second section will contain the procedure for the placement of the buffers.
The third section will contain the study of the mesh for differential voltage and frequency.
The fourth section will contain the simulation of the mesh to obtain the skew and the power consumed.
In our study we have used the XXXXXXX benchmark of Intel based on the 45nm technology.
MESH STRUCTURING
The traditional methodology of forming a mesh structure involved forming a uniform mesh and iteratively increase the number of mesh windows until a negligible skew was obtained. Further the edges of the mesh are slid in such a way that the lengths of the stubs connecting the mesh edges are minimized. These above approaches structure the mesh wires only with respect to any given skew and not the cluster density of the sinks, even though it forms a non-uniform mesh. This increases the total wire length of the mesh increasing the mesh capacitance and hence increasing the total power consumed.
Hence a more efficient structuring technique of considering the cluster density of the sinks is used in this paper. Here the sinks are clustered inside a mesh window by considering the sum of the capacitances of the sinks less than a certain predefined target capacitance. In this paper we have considered various target capacitances ranging from 25fF to 150fF to form non-uniform mesh windows. The structuring of the mesh is done in MATLAB.
The whole procedure follows a DIVIDE and CUONQURE principle. This algorithm takes the entire benchmark plot into consideration and takes the sum of the capacitances of the sinks. If the sum of the capacitance of the considered area is greater than a predefined capacitance, the entire area is divided into four equal windows or quadrants and the sum of the capacitances are noted again and the procedure follows until a window has a total capacitance less than the predefined capacitance.
To start with, the extreme co-ordinates of the bench mark plot are considered, namely Xmax, Ymax, Xmin and Ymin. These are the boundary conditions for which the sums of the capacitances of the sinks which lie inside these co-ordinates are calculated. Initially it spans the entire benchmark plot. If found that the sum of the capacitances are greater than the predefined capacitance value, the plot is divided into four quadrants or windows. Next the boundary co-ordinates are updated in such a way that it forms an extremity to any one of the four quadrants formed. This forms the DIVIDE phase. It is seen that the window area is not more than 0.3mm. If this exceeds the window is again subdivided in to four quadrants again such that the total capacitance inside the window is less than the predefined value. Accordingly this procedure continues in all four quadrants until a stage reaches where the sum of the capacitances of the sinks if equal to or less than that of the target capacitance. This forms the CONQURE phase. After the end of the procedure, a non-uniform mesh is formed wherein every window contains sinks whose total capacitance is less than the target value. It should be noted that even though the formation of the mesh is according to sink capacitances, the densities of the sinks vary in each window.
The above observation brings us to the next part of the algorithm which is the connection of the sinks to the nearest mesh edge using stubs. This is done by calculating the distance between the four edges of a window from every sink using the distance formula and connecting the sink to the edge with the least distance using a wire called a stub. It is to be noted that the distance considered is the actual distance and not the manhattans distance.
Once the formation of the clock mesh and the connection of the stubs to the sinks are done, the total length of the mesh wire is calculated. This distance is calculated from a certain origin point of the mesh to the end of the stub that is the sink. The algorithmic flow chart of the structuring of the mesh is shown in the figure.
BUFFER PLACEMENT
Buffers are required three purposes. The first is the integrity of the clock signal being maintained. Next is to reduce the load capacitance seen by the clock source. Lastly, which is of great importance to us is the reduction of the skew by inducing equal delay to the sinks.
To have an almost zero skew by inducing delays in the clock path to the sinks, buffers of different sizes must be used depending on the capacitance of the sinks. To have such a layout it is necessary to form a buffer table with varying buffer sizes according to total sink capacitance in its path ( here the sum of all the skin capacitances inside a mesh window is considered) . A small sized buffer is placed at the node of the mesh window inside which the sum of the sink capacitance is less than a certain minimum value and a relatively larger sized buffer is placed where the sink capacitance is between certain values of capacitance which is stored in the buffer table. We have considered eight buffers of different sizes in accordance with certain predetermined capacitance values to form the buffer table. The placement of buffer is also done in MATLAB. The procedure is as follows.
The Centroid of a mesh window is calculated using the values of the sink capacitances inside the mesh using the formula given below:
(x_(Cap ),y_cap )=((”(cap??x(edge) ) )/(”cap),(”’cap??y(edge) ‘)/(”cap))
After the value of the Centroid is calculated the buffer is placed at the mesh node closest to the coordinates of the Centroid. The sizes of the buffers to be placed are taken from the buffer table which was formed earlier, by comparing the values of the total capacitance inside a mesh with that of the predefined values stored in the table. Hence if the sum of the capacitance is less than 25pF then buffer B1 is placed on the mesh node nearest to the centroid co-ordinates. If the capacitance was found to be between 25pF-50pF of 50pF-72pF buffers B2 and B3 are placed and so on.
If there were a case wherein the co-ordinates of the centroid of two mesh windows are nearest to the same mesh node then the capacitance of the two meshes are added and an equivalent buffer is referred from the table and placed.
The buffer placement does not stop after buffers are placed for all sinks. It is an iterative process as will be shown further in this paper. The buffers are further shuffled and reordered until the skew and the power requirements are reached because the wire capacitance further adds to the total capacitance and increases the power consumed. Hence the buffer placement is an NP-complete complexity class problem.
SIMULATION OF THE MESH IN NGSPICE
Once the placement of buffer is done, a netlist containing the information regarding the placement and sizes of the buffer and the capacitances they are driving is generated in MATLAB. The netlist generated is saved under the *.cir extension with the syntax shown below
[Component Name] [positive Node] [negative Node] [Value]
The mesh maybe modeled as delta connected network (??-connected) or a star connected network (??- connected network). This paper incorporates the delta-connected network of the mesh as well as the stub connection to model the netlist file.
Once the netlist is generated and saved as a *.cir format, it is imported to NGSPICE. On importing the netlist to NGSPICE, a parametric analysis is performed for different values of voltage and frequency targeting a 45nm technology.
The netlist is generated for four quadrants each of which is imported into NGSPICE successively and simulated. We calculate the latency of the clock signals inside each window of a given quadrant.
The buffers are first probed and then external clock pulses are passed through it. This means that external contacts are given to the nodes where the buffers are placed and clock input is given in the form of variable pulses. A parameterized analysis is done in three steps. Firstly we obtain the latency at the clock sinks by varying the voltage and keeping the frequency of the clock pulses constant. Next we vary the frequency of the clock pulses and keep the voltage constant. Finally we vary both the voltage and frequency of the clock pulses and obtain the latency values at the sinks.
After the analysis and obtaining the latencies of the mesh sinks in every quadrant with varying voltage and frequency, the data obtained is imported back into MATLAB in the *.raw format. To read this in MATLAB the HSPICE tool is required. This file is written in the form of a matrix wherein the first row would contain the latency values calculated at different voltage and frequency of all the sinks and subsequent rows would contain the voltage at every sinks. From this above data, the skew at different sinks are calculated by taking the difference in the latencies of two different sets of sinks given by
skew=latency(s1)-latency(s2)
It is to be noted here that if the skew obtained is greater than a target skew we must go back to the buffer placement step wherein the buffers are reordered according to the buffer table values and the SPICE netlist is obtained again. This is once again imported to NGSPICE, where the latencies of each quadrant sink are calculated. The skew of this new reordered netlist is calculated by the same steps mentioned above. Hence the buffer placement can be more of a heuristic approach as it needs to be reordered iteratively until the target skew is obtained.
TOTAL POWER DISSIPATION
The total power dissipated includes the power dissipated through the mesh wire and the stub capacitance and the power consumed by the buffer of different sizes.
Essay: Essay on the performance of the chip and clock frequency
Essay details and download:
- Subject area(s): Computer science essays
- Reading time: 9 minutes
- Price: Free download
- Published: 27 July 2014*
- Last Modified: 3 October 2024
- File format: Text
- Words: 2,454 (approx)
- Number of pages: 10 (approx)
Text preview of this essay:
This page of the essay has 2,454 words.
About this essay:
If you use part of this page in your own work, you need to provide a citation, as follows:
Essay Sauce, Essay on the performance of the chip and clock frequency. Available from:<https://www.essaysauce.com/computer-science-essays/essay-performance-chip-clock-frequency/> [Accessed 19-11-24].
These Computer science essays have been submitted to us by students in order to help you with your studies.
* This essay may have been previously published on EssaySauce.com and/or Essay.uk.com at an earlier date than indicated.