my-scraper

Web Scraper Chrome Extension

A powerful Chrome extension that saves web pages as static HTML with all images, organized in a clean folder structure. Features advanced sitemap support, batch downloading, and a beautiful tree-view interface for easy page selection.

✨ Features

🎯 Core Functionality

📋 Sitemap Integration

⚡ Batch Operations

🎨 Modern UI

🔄 Two Download Modes

Manual Mode (Save Page X)

  1. Load sitemap or navigate to desired page
  2. Click “Save Page X” to download current page
  3. Extension automatically navigates to next URL in sitemap
  4. Reopen popup and click again for next page
  5. Progress tracked across sessions

Batch Mode (Save Selected Pages)

  1. Load sitemap to see all available URLs
  2. Check desired pages (or use Select All)
  3. Click “Save Selected Pages”
  4. All pages download automatically in background
  5. Receive notifications for progress and completion

📁 Folder Structure

Downloads are organized as:

domain_timestamp/
├── page_1/
│   ├── index.html
│   └── images/
│       ├── 1_image1.jpg
│       ├── 2_image2.png
│       └── ...
├── page_2/
│   ├── index.html
│   └── images/
│       └── ...
└── ...

🧪 Unit Tests

The project includes a suite of unit tests for the core logic (sitemap parsing, scraping, naming).

  1. Install dependencies:

     npm install
    
  2. Run tests:

     npm test
    

See TESTS-README.md for more details on the testing setup and architecture.

🚀 How to Test Locally

To test the extension on your local machine:

  1. Open Chrome Extensions:

    • Open Google Chrome browser
    • Navigate to chrome://extensions in the address bar
  2. Enable Developer Mode:

    • Toggle “Developer mode” switch in the top-right corner to “On”
  3. Load the Extension:

    • Click “Load unpacked” button on the left
    • Navigate to the project’s extension/ directory and click “Select”
    • Ensure manifest.json is in the extension/ folder
  4. Verify Installation:

    • Extension should appear in your list of installed extensions
    • Click the extension icon in Chrome toolbar (or pin it from the puzzle piece menu)

🎮 How to Use

Single Page Download

  1. Navigate to any web page
  2. Click the extension icon
  3. Click “💾 Save Page” button
  4. Page and images download to your Downloads folder

Sitemap-Based Download

  1. Navigate to any website
  2. Click extension icon
  3. Click “📋 Load Sitemap” to fetch sitemap.xml
  4. Browse URLs in tree structure (grouped by category)
  5. Expand categories by clicking the arrow or category name
  6. Check boxes for pages you want to download
  7. Choose your mode:
    • Manual: Click “💾 Save Page X” to download one at a time
    • Batch: Click “⚡ Save Selected Pages” for automatic batch download

Intelligent Scanning

  1. Click “🔍 Smart Website Scan” to start the hybrid discovery engine
  2. The engine will automatically check for sitemaps and fallback to a crawler if needed
  3. Watch the page tree grow in real-time as new pages are discovered
  4. Use the “🔄” icon in the header to start a fresh session at any time

Managing Downloads

🛠️ Technical Details

Technologies Used

Key Features

Permissions Required

📦 How to Submit to Chrome Web Store

To publish your extension on the Chrome Web Store:

  1. Package the Extension:

    • Create a .zip file of the extension/ directory
    • Ensure manifest.json is at the root of the zip file
    • Do not include .git, node_modules, or development files. Only zip the contents of the extension/ folder.
  2. Go to Chrome Developer Dashboard:

  3. Add a New Item:

    • Click “Add new item” button
    • Upload your .zip file
  4. Complete Store Listing:

    • Fill out all required information:
      • Description: Detailed feature list and benefits
      • Icons: Provide 128x128, 48x48, and 16x16 PNG icons
      • Screenshots: Add 1280x800 or 640x400 screenshots showcasing features
      • Category: Choose “Productivity” or “Developer Tools”
      • Language: Select primary language
  5. Privacy Practices:

    • Declare data usage and privacy policy
    • Explain permissions required and why
    • State that no user data is collected or transmitted
  6. Submit for Review:

    • Click “Submit for review”
    • Review typically takes 1-3 business days
    • Address any feedback from Chrome Web Store team

Once approved, your extension will be publicly available on the Chrome Web Store.

🤝 Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

📄 License

This project is open source and available under the MIT License.