Howto make a static copy of Joomla site with wget

You may also like...

2 Responses

  1. Thanks for the instructions, this is excellent. We just ran into a case where we need to do this (keep content online but drop obsolete CMS version).

  2. algol says:

    Great post. It helped to “start” playing with wget for mirroring my old Joomla…
    I found, from man page, that there is an option to download “missing files” which is “-p” or “–page-requisites”.
    So a command like:
    wget -p -m -k -K -E
    will get (almost) all css, js and images.
    You have a fast way to verify which files are still missing:
    grep -lr “http:\/\/” | sort -u | xargs sed -ne ‘/http:\/\/your\.domain\.com/s/.*”http:\/\/\([^”]*\).*/http:\/\/\1/p’ | sort -u
    And you can get them using:
    grep -lr “http:\/\/” | sort -u | xargs sed -ne ‘/http:\/\/your\.domain\.com/.*”http:\/\/\([^”]*\).*/http:\/\/\1/p’ | sort -u | wget -x -nH –cut-dirs=2 -i –
    You sill have to use developer tools to find missing files included by javascript or other (strange) way.
    But I wanted my mirror to have the same links. And for that the files names can’t include the “.html” extension.
    So I found that the “-p” option (and “-k”) doesn’t work so well if you don’t use the “-E” option.
    But using “-E” and “-p” is the best way to get “page-requisites”. So I did a first fetch with “-E”, deleted all “.html” files and then fetched all over again without “-E”.
    Has “-k” doesn’t work that well without “-E” I also had to make some other substitutions:
    # To converter all missing absolute URL to relative
    grep -lr “http:\/\/” | sort -u | xargs sed -i -e ‘/http:\/\/your\.domain\.com/s/http:\/\/your\.domain\.com\/\([^”]*\)/\1/g’
    # To converter URL with ? into its URL encoding equivalent (%3F)
    grep -lr –exclude=*.{css,js} ‘=\s\{0,1\}”[^?”]*?[^”]*”‘ | sort -u | xargs sed -i -e ‘/\(=\s\{0,1\}”[^?”]*\)?\([^”]*”\)/s/\(=\s\{0,1\}”[^?”]*\)?\([^”]*”\)/\1%3F\2/g’

    Hope that helps.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.