Clock gating / using "highly discouraged" constraint

0

I have a wide and deep shift register in my design that I want to control using a gated clock. Its size and distribution throughout the entire device make a clock enable an inferior option from a routability and resource usage perspective. I tried the below and believe my design is working but have misgivings about using a constraint that is "highly discouraged" without truly understanding what I'm doing and if there's a recommended alternative.

Guidance on the suitability of my approach / why the constraint is "highly discouraged" would be appreciated.

I sequentially tried:

  1. Inferring from Verilog RTL code in various ways. Nothing looked good.

  2. Instantiating a BUFGCE primitive alone.

CRITICAL WARNING: [DRC HDPR-59] Clock Net Rule Violation: Illegal clock load 'WRAPPER_INST/CL/BUFGCE_unknown' found on PR boundary clock net 'WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clk_out1'. Boundary clock nets are not fully supported to drive loads of type BUFGCE inside a reconfigurable region. This type of connection may cause downstream tool issues. The recommended solution is to add an MMCM as the clock load driving the original BUFGCE load.
  1. Instantiating a MMCME4_BASE primitive before a BUFGCE primitive.
Phase 1.2 IO Placement/ Clock Placement/ Build Placer Device
ERROR: [Place 30-718] Sub-optimal placement for an MMCM/PLL-BUFGCE-MMCM/PLL cascade pair.If this sub optimal condition is acceptable for this design, you may use the CLOCK_DEDICATED_ROUTE  constraint in the .xdc file to demote this message to a WARNING. However, the use of this override is highly discouraged. These examples can be used directly in the .xdc file to override this clock rule. 
set_property CLOCK_DEDICATED_ROUTE ANY_CMT_COLUMN [get_nets WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clk_out1] 

WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clkout1_buf (BUFGCE.O) is locked to BUFGCE_X1Y181 (in SLR 1)
The loads are distributed to 1 user pblock constraints. In addition, there are 0 loads not in user pblock constraints.

Displaying the first 1 loads for pblock constraint 1
WRAPPER_INST/CL/MMCME4_BASE_inst (MMCME4_ADV.CLKIN1) is provisionally placed by clockplacer on MMCM_X0Y5 (in SLR 1)

The above error could possibly be related to other connected instances. Following is a list of all the related clock rules and their respective instances.

Clock Rule: rule_bufgce_bufg_conflict
Status: PASS 
Rule Description: Only one of the 2 available sites (BUFGCE or BUFGCE_DIV/BUFGCTRL) in a pair can be used at the same time WRAPPER_INST/CL/BUFGCE_unknown (BUFGCE.O) is provisionally placed by clockplacer on BUFGCE_X0Y120 (in SLR 1)

Clock Rule: rule_mmcm_bufg
Status: PASS 
Rule Description: A MMCM driving a BUFG must be placed in the same clock region of the device as the BUFG WRAPPER_INST/CL/MMCME4_BASE_inst (MMCME4_ADV.CLKOUT0) is provisionally placed by clockplacer on MMCM_X0Y5 (in SLR 1) WRAPPER_INST/CL/BUFGCE_unknown (BUFGCE.I) is provisionally placed by clockplacer on BUFGCE_X0Y120 (in SLR 1)

Clock Rule: rule_bufgce_bufg_conflict
Status: PASS 
Rule Description: Only one of the 2 available sites (BUFGCE or BUFGCE_DIV/BUFGCTRL) in a pair can be used at the same time WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clkout1_buf (BUFGCE.O) is locked to BUFGCE_X1Y181 (in SLR 1)

Clock Rule: rule_mmcm_bufg
Status: PASS 
Rule Description: A MMCM driving a BUFG must be placed in the same clock region of the device as the BUFG WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/mmcme3_adv_inst (MMCME4_ADV.CLKOUT0) is locked to MMCM_X1Y7 (in SLR 1) WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clkout1_buf (BUFGCE.I) is locked to BUFGCE_X1Y181 (in SLR 1)

Clock Rule: rule_bufgce_bufg_conflict
Status: PASS 
Rule Description: Only one of the 2 available sites (BUFGCE or BUFGCE_DIV/BUFGCTRL) in a pair can be used at the same time WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clkout2_buf (BUFGCE.O) is locked to BUFGCE_X1Y183 (in SLR 1)

Clock Rule: rule_bufgce_bufg_conflict
Status: PASS 
Rule Description: Only one of the 2 available sites (BUFGCE or BUFGCE_DIV/BUFGCTRL) in a pair can be used at the same time WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clkout3_buf (BUFGCE.O) is locked to BUFGCE_X1Y172 (in SLR 1)

Clock Rule: rule_bufgce_bufg_conflict
Status: PASS 
Rule Description: Only one of the 2 available sites (BUFGCE or BUFGCE_DIV/BUFGCTRL) in a pair can be used at the same time WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clkout4_buf (BUFGCE.O) is locked to BUFGCE_X1Y171 (in SLR 1)

Resolution: The MMCM/PLL-BUFGCE-MMCM/PLL cascade pair can use the dedicated path between them if they are placed in vertically adjacent clock regions and in the same column (LEFT/RIGHT) of the device.
  1. Instantiating a MMCME4_BASE before a BUFGCE primitive and constraining the design with set_property CLOCK_DEDICATED_ROUTE ANY_CMT_COLUMN [get_nets WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clk_out1] in cl_pnr_user.xdc. This met with success.
  • Some observations and follow up question now that I've been working with the gated clock design:

    It solved my primary timing problem and seems to be a good choice for my design, though there are secondary issues. The post MMCM clock has an inferior pessimism of ~0.10ns vs the clock_main_a0 so is generally inferior. The Xilinx tools treat clock_main_a0 differently than the MMCM clocks so I'm having timing problems at the interface between the shell and CL, which I can work around by inserting a register and constraining that register to the same SLR as the shell signal (without the constraint the tools kept putting the register after a SLR crossing and failing timing). I'm kind of surprised by 3 and curious if this is the expected behavior or there's something I should be doing with the MMCM to get the post MMCM clocks treated as effectively the same as clock_main_a0? The design didn't have any timing problems at these points before.

ljp0101
asked 2 years ago418 views
4 Answers
0

Dear customer

Thank you so much for your interest in AWS. Just to make sure, so you tried few different approaches and some of them seem to be working. But just wanted to make sure, if any of the working apporaches is okay and why using "clock gating" is a highly discouraged option. Is that correct?

Thanks

AWS
answered 2 years ago
0

Thanks for looking into this.

Suppose my questions at this point are:

  1. What is the canonical way to implement a gated clock on F1? Spitballing but maybe I should be gating something other than clock_main_a0, e.g. whatever clock drives it if it's accessible in the CL.
  2. If the MMCM cascade pair is the canonical approach, why is using an MMCM cascade pair highly discouraged? (Clock gating itself is fully supported and not discouraged by Xilinx.)
  3. If the MMCM cascade pair is the canonical approach, is there a way to have the MMCM output clock treated as the same as the MMCM input clock from a Xilinx tools perspective? They are the same clock even in phase from what I can tell but place and route isn't behaving like they're different.

What I'm doing now is workable but suboptimal for my design, especially around question 3, as I need to put two registers or false path between the shell and post MMCM CL clock transition to make anything meet 250 MHz timing.

ljp0101
answered 2 years ago
0

Dear customer

1.) clk_main_a0 is generated inside the shell and the clock driving clk_main_a0 is not accessible to CL.
2.) MMCM connected to clk_main_a0 with set_property CLOCK_DEDICATED_ROUTE ANY_CMT_COLUMN is a valid possible option for clock gating on F1. It is discourage because of the "extra delay", which is expected with this clocking structure. One possibility is to try PLL Vs MMCM to see if the delay improves.
3.) Cross clock crossings timing paths are expected as a new clocking structure is created with MMCM or PLL. Clk_main_a0 is different than the new clocking structure so it’s timed differently. However, it is recommended to make sure that these paths can be set_false_path as normally they are true paths that need synchronizers

Thanks

AWS
answered 2 years ago
0

Thanks. Seems like the MMCM cascade pair is just a weird thing to do from a Xilinx content but makes sense for the F1 use case. For anyone else looking into this, I ended up treating the shell and CL clocks as async and run my design off the CL clocks as it wasn't a huge deal with my design. I did notice thought that using the MMCM feedback clock with a post BUFG generated mirror clock brought down the clk_main_a0 to CL clock skew so it was possible to rely on the phase relationship to cross clock domains for 250 MHz but didn't seem to be the case at 500 MHz.

ljp0101
answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions