Clock gating / using "highly discouraged" constraint

Question

I have a wide and deep shift register in my design that I want to control using a gated clock. Its size and distribution throughout the entire device make a clock enable an inferior option from a routability and resource usage perspective. I tried the below and believe my design is working but have misgivings about using a constraint that is "highly discouraged" without truly understanding what I'm doing and if there's a recommended alternative.

Guidance on the suitability of my approach / why the constraint is "highly discouraged" would be appreciated.

I sequentially tried:

1) Inferring from Verilog RTL code in various ways. Nothing looked good.

2) Instantiating a BUFGCE primitive alone.
```
CRITICAL WARNING: [DRC HDPR-59] Clock Net Rule Violation: Illegal clock load 'WRAPPER_INST/CL/BUFGCE_unknown' found on PR boundary clock net 'WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clk_out1'. Boundary clock nets are not fully supported to drive loads of type BUFGCE inside a reconfigurable region. This type of connection may cause downstream tool issues. The recommended solution is to add an MMCM as the clock load driving the original BUFGCE load.
```

3) Instantiating a MMCME4_BASE primitive before a BUFGCE primitive.
```
Phase 1.2 IO Placement/ Clock Placement/ Build Placer Device
ERROR: [Place 30-718] Sub-optimal placement for an MMCM/PLL-BUFGCE-MMCM/PLL cascade pair.If this sub optimal condition is acceptable for this design, you may use the CLOCK_DEDICATED_ROUTE  constraint in the .xdc file to demote this message to a WARNING. However, the use of this override is highly discouraged. These examples can be used directly in the .xdc file to override this clock rule. 
set_property CLOCK_DEDICATED_ROUTE ANY_CMT_COLUMN [get_nets WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clk_out1]

WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clkout1_buf (BUFGCE.O) is locked to BUFGCE_X1Y181 (in SLR 1)
The loads are distributed to 1 user pblock constraints. In addition, there are 0 loads not in user pblock constraints.

Displaying the first 1 loads for pblock constraint 1
WRAPPER_INST/CL/MMCME4_BASE_inst (MMCME4_ADV.CLKIN1) is provisionally placed by clockplacer on MMCM_X0Y5 (in SLR 1)

The above error could possibly be related to other connected instances. Following is a list of all the related clock rules and their respective instances.

Clock Rule: rule_bufgce_bufg_conflict
Status: PASS 
Rule Description: Only one of the 2 available sites (BUFGCE or BUFGCE_DIV/BUFGCTRL) in a pair can be used at the same time WRAPPER_INST/CL/BUFGCE_unknown (BUFGCE.O) is provisionally placed by clockplacer on BUFGCE_X0Y120 (in SLR 1)

Clock Rule: rule_mmcm_bufg
Status: PASS 
Rule Description: A MMCM driving a BUFG must be placed in the same clock region of the device as the BUFG WRAPPER_INST/CL/MMCME4_BASE_inst (MMCME4_ADV.CLKOUT0) is provisionally placed by clockplacer on MMCM_X0Y5 (in SLR 1) WRAPPER_INST/CL/BUFGCE_unknown (BUFGCE.I) is provisionally placed by clockplacer on BUFGCE_X0Y120 (in SLR 1)

Clock Rule: rule_bufgce_bufg_conflict
Status: PASS 
Rule Description: Only one of the 2 available sites (BUFGCE or BUFGCE_DIV/BUFGCTRL) in a pair can be used at the same time WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clkout1_buf (BUFGCE.O) is locked to BUFGCE_X1Y181 (in SLR 1)

Clock Rule: rule_mmcm_bufg
Status: PASS 
Rule Description: A MMCM driving a BUFG must be placed in the same clock region of the device as the BUFG WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/mmcme3_adv_inst (MMCME4_ADV.CLKOUT0) is locked to MMCM_X1Y7 (in SLR 1) WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clkout1_buf (BUFGCE.I) is locked to BUFGCE_X1Y181 (in SLR 1)

Clock Rule: rule_bufgce_bufg_conflict
Status: PASS 
Rule Description: Only one of the 2 available sites (BUFGCE or BUFGCE_DIV/BUFGCTRL) in a pair can be used at the same time WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clkout2_buf (BUFGCE.O) is locked to BUFGCE_X1Y183 (in SLR 1)

Clock Rule: rule_bufgce_bufg_conflict
Status: PASS 
Rule Description: Only one of the 2 available sites (BUFGCE or BUFGCE_DIV/BUFGCTRL) in a pair can be used at the same time WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clkout3_buf (BUFGCE.O) is locked to BUFGCE_X1Y172 (in SLR 1)

Clock Rule: rule_bufgce_bufg_conflict
Status: PASS 
Rule Description: Only one of the 2 available sites (BUFGCE or BUFGCE_DIV/BUFGCTRL) in a pair can be used at the same time WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clkout4_buf (BUFGCE.O) is locked to BUFGCE_X1Y171 (in SLR 1)

Resolution: The MMCM/PLL-BUFGCE-MMCM/PLL cascade pair can use the dedicated path between them if they are placed in vertically adjacent clock regions and in the same column (LEFT/RIGHT) of the device.
```

4) Instantiating a MMCME4_BASE before a BUFGCE primitive and constraining the design with `set_property CLOCK_DEDICATED_ROUTE ANY_CMT_COLUMN [get_nets WRAPPER_INST/SH/kernel_clks_i/clkwiz_sys_clk/inst/CLK_CORE_DRP_I/clk_inst/clk_out1]` in cl_pnr_user.xdc. This met with success.

Answer

Thanks for looking into this.

Suppose my questions at this point are:
1) What is the canonical way to implement a gated clock on F1? Spitballing but maybe I should be gating something other than clock_main_a0, e.g. whatever clock drives it if it's accessible in the CL.
2) If the MMCM cascade pair is the canonical approach, why is using an MMCM cascade pair highly discouraged? (Clock gating itself is fully supported and not discouraged by Xilinx.)
3) If the MMCM cascade pair is the canonical approach, is there a way to have the MMCM output clock treated as the same as the MMCM input clock from a Xilinx tools perspective? They are the same clock even in phase from what I can tell but place and route isn't behaving like they're different.

What I'm doing now is workable but suboptimal for my design, especially around question 3, as I need to put two registers or false path between the shell and post MMCM CL clock transition to make anything meet 250 MHz timing.

Answer

Dear customer

Thank you so much for your interest in AWS. Just to make sure, so you tried few different approaches and some of them seem to be working. But just wanted to make sure, if any of the working apporaches is okay and why using "clock gating" is a highly discouraged option. Is that correct?

Thanks

Answer

Dear customer

1.) clk_main_a0 is generated inside the shell and the clock driving clk_main_a0 is not accessible to CL.  
2.) MMCM connected to clk_main_a0 with set_property CLOCK_DEDICATED_ROUTE ANY_CMT_COLUMN is a valid possible option for clock gating on F1. It is discourage because of the "extra delay", which is expected with this clocking structure. One possibility is to try PLL Vs MMCM to see if the delay improves.  
3.) Cross clock crossings timing paths are expected as a new clocking structure is created with MMCM or PLL.  Clk_main_a0 is different than the new clocking structure so it’s timed differently.  However, it is recommended to make sure that these paths can be set_false_path as normally they are true paths that need synchronizers

Thanks

Answer

Thanks. Seems like the MMCM cascade pair is just a weird thing to do from a Xilinx content but makes sense for the F1 use case. For anyone else looking into this, I ended up treating the shell and CL clocks as async and run my design off the CL clocks as it wasn't a huge deal with my design. I did notice thought that using the MMCM feedback clock with a post BUFG generated mirror clock brought down the clk_main_a0 to CL clock skew so it was possible to rely on the phase relationship to cross clock domains for 250 MHz but didn't seem to be the case at 500 MHz.

Clock gating / using "highly discouraged" constraint

Relevant content