whyml

Example 1: Complete Webpage Scraping and Regeneration Workflow

This example demonstrates the complete WhyML workflow: scraping a webpage, simplifying its structure, and regenerating it as HTML from a YAML manifest.

Files in this example:

Workflow Steps:

1. Scrape a webpage and generate YAML manifest

whyml scrape https://example.com --output scraped-manifest.yaml --simplify-structure --max-depth 5

2. Convert YAML manifest back to HTML

whyml convert --from scraped-manifest.yaml --to regenerated.html --as html

🚀 Easy Way to Run This Example

Instead of running commands manually, use our provided scripts:

# From WhyML root directory
./scripts/examples/run-example-1.sh

Option 2: Run with Python Script

# From WhyML root directory  
python3 scripts/examples/run-example-1.py

Option 3: Run All Examples

# From WhyML root directory
./scripts/run-all-examples.sh

These scripts will:

3. Compare original vs regenerated (optional)

whyml scrape https://example.com --test-conversion --output-html regenerated.html

Key Features Demonstrated:

Expected Results:

The regenerated HTML should maintain the essential structure and content of the original webpage while being cleaner and more maintainable through the YAML manifest approach.