adapting the generated access scriptsOf course such generated scripts are only the raw version of the scripts you will run in production. There are a lot of hidden fields and URLs in HTML bearing so-called session variables, and we have to extract such strings from the HTML and employ them during further processing. Of course this is somehow reverse engineering, as nobody explains us the concept behind the web pages and how they keep state. Be aware, that web servers also encode the user identity into that state, and if that state was easy to decode entirely, you were able to pretend to be any other user without ever authentifying as that one. This did happen in the past.
Once a new raw perl script is generated properly,
it needs getting embedded into a proper frame,
you might adopt p.pl for that purpose, it serves me well.
Until it works well, I always call p.pl with
crawlink.pl pretty-prints HTML forms. | |