This article explains how to use the sample Python code to split large BRE HTML output files generated by AWS Transform for large COBOL codes.
Overview
Often times, we have large COBOL code files that can contain thousands of Business Rules. Currently, AWS Transform stores all the program flow and extracted business rules from COBOL code as HTML files in Amazon S3 for offline viewing. However, due to the large size of the HTML files, they often fail to open in browsers.
You can use the Python script below to split large HTML files into smaller files that can be opened in browsers.
BRE: Business Rule Extraction
Python code
Open a text editor and save the code below as a Python file in your preferred directory, for example 'split_bre_html.py'.
#!/usr/bin/env python3
import re
import sys
def split_html(filename, max_size_mb=15):
max_size = max_size_mb * 1024 * 1024
with open(filename, 'r', encoding='utf-8') as f:
content = f.read()
sections = list(re.finditer(r'<section id="n_(\d+)-[^"]*">', content))
header_end = sections[0].start() if sections else len(content)
header = content[:header_end]
last_section_end = content.rfind('</section>')
footer = content[last_section_end + 10:] if last_section_end != -1 else ""
current_chunk = header
chunk_num = 1
base_name = filename.rsplit('.', 1)[0]
start_section = sections[0].group(1) if sections else "1"
for i, section in enumerate(sections):
section_start = section.start()
if i + 1 < len(sections):
section_end = sections[i + 1].start()
else:
section_end = last_section_end + 10
section_content = content[section_start:section_end]
section_num = section.group(1)
if len(current_chunk.encode('utf-8')) + len(section_content.encode('utf-8')) + len(footer.encode('utf-8')) > max_size and current_chunk != header:
with open(f'{base_name}_part{chunk_num}.html', 'w', encoding='utf-8') as f:
f.write(current_chunk + footer)
print(f"Created {base_name}_part{chunk_num}.html: {len(current_chunk.encode('utf-8'))/1024/1024:.1f}MB (sections {start_section}-{prev_section_num})")
chunk_num += 1
current_chunk = header
start_section = section_num
current_chunk += section_content
prev_section_num = section_num
with open(f'{base_name}_part{chunk_num}.html', 'w', encoding='utf-8') as f:
f.write(current_chunk + footer)
print(f"Created {base_name}_part{chunk_num}.html: {len(current_chunk.encode('utf-8'))/1024/1024:.1f}MB (sections {start_section}-{prev_section_num})")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python3 split_numbered.py <filename> [size_in_MB]")
sys.exit(1)
filename = sys.argv[1]
size_mb = float(sys.argv[2]) if len(sys.argv) > 2 else 15
split_html(filename, size_mb)
Prerequisites and Instructions:
- Ensure Python is installed on your system.
- Place the BRE HTML files in the same directory as the script.
- Execute the Python command:
python split_bre_html.py <html_file_name> [size_in_mb]
where:
html_file_name: Name of the HTML file you want to split
size_in_mb: Optional parameter to specify the size of split files (default: 15MB)
After the script completes execution, it will create multiple output files with the following naming convention:
- Original filename with suffixes: _part1.html, _part2.html, etc.
- The script will also display a summary showing which BRE sections are included in each output file.
To install Python, go to www.python.org/downloads, download the latest stable release for your operating system (Windows, macOS, or Linux), and run the installer, ensuring you select "Add Python to PATH" during the installation process on Windows. After installation, open your terminal or command prompt and type python --version to verify the installation.