this is a "setup once, never look back" thing. if you have bunch of domains (e.g. you are a domain hunter or something, living from clicks and impressions) you can use these perl scripts and apache/webalizer setups to initialize your logging/stats facilities.
logsplit is what the name says - a log splitter having the vhost name as log filename
runwebalizer will spawn $pnum processes initializing new ones as other are done and will feed webalizer processes with log data. if it finds a log for a virtual host that is not setuped yet with webalizer this will be done automagically at runtime.
few assumptions:
scripts are installed in /home/apps/logging
webalizer configs, database and html output are in /home/data/stats
change those as per your needs ...
apache_setup:
LogFormat "%V %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" catchall_combo
CustomLog |/home/apps/logging/logsplit catchall_combo
cron setup example:
1 2 * * * root /home/apps/logging/runwebalizer >/dev/null 2>/dev/null
as a note, the logsplit shouldn't be implemented like this but i am so lazy to doit with mod_perl and i don't really need it, my websites not generating so much load on server side. if someone will request a mod_perl implementation i'll doit in my free time - it will look nicer as a hook inside apache logging phase. im just sick of engineering things to death.
this is a multilevel, almost-incremental backup using rsync. it is using filesystem's hardlinks to save space on unmodified data.
you should modify as per your needs the destination root backups directory and the backup levels from sed -e '1,30d' and also the folders you want to backup.
the script can be easily modified to backup the data remote or to multiple servers (local,remote1,...,remoteN)
if you use a whois command line client to dig some informations about a domain, you can see probably in the header of response a phrase like this one: "you are not authorized to process automatically the information from this database". verisign model. thats ok, besides domain hunters and spammers noone would need to process the data automatically. but there is a thing that upset me and it happened to me twice. so i was querying the verisign whois servers for a nice-to-brand domain name. i found it to be free and few minutes later i went to my registar account to buy the name. it was taken already. this happened to me twice. that makes me think verisign sells input queries to domain hunters and its not just a coincidence. it was a brandable name, not a search-engine friendly name usually targetted by domain hunters. twice. so fuck them. here is a script to process automatically whois output. if you need socks support and a small framework to plug in output parsers, call me. at this moment it just binds to a list of ips you give to the agent in a file, one per line. it fakes a random timed wait so whois server should have trouble to identify you as a bot.
whoisagent is feeding himself from main server via qmgr.cgi script.
the whoisagent will report the results to a web server that hosts the script qmgr.cgi. the url is configurable in config.pm file.
you can spread whoisagents world wide and make them return the data to a main server for further processing. data processing plugins are to be installed on main server side.
whoisagent is using Event::Lib a perl wrapper for libevent written by niels provos
there is a very important delimitation you have to consider when you will start doing some web content management and publishing. document versus data. when you are doing content management/publishing you are working with documents not with data. for the sake of optimization you will host data with the help of relational databases like postgres or mysql. documents dont fit well inside a relational database, tho this very weird match is very popular. the popularity of php+mysql+apache choice when its about managing content is overwhelmed only by its stupidity and still this is how programmers choose to code application backends.
the very best format you may choose to host documents is xml. you can choose either to host the xml files directly on file system or by using the berkeley xml database. hosting the documents in xml format will keep them well formed (great if you will define dtd for your xmlized documents) and easy to transform via xslt or search via xpath/xquery. with very simple xsl templates and a good xslt preprocessor you will be able to export very easily your documents in various formats. xhtml, pdf, doc/odt, ps, mail message or any other kind of message format. also xml gives you the chance to sign and encrypt your data very nice.
as for searching documents, you really dont want to doit like "select something from some weird table joins where some regexps and dates and so on group and order". think xpath/xquery - a better choice.
this is how i want to implement the cms for this website. unfortunately i did not have time yet to engineer it properly so at this moment its some weird mess of xml,perl,file system lookups and rewrite rules. there is no dtd for describing the xml format, no xsl transforms but rather using html::template and no document search implemented yet, just a very rudimentary way of tagging the documents for quick lookups.
xml format:
<document>
<headers>
</headers>
<body>
<item title="">
item_content
<link title="" url=""/>
...
<link title="" url=""/>
</item>
<item title="">
</item>
...
...
...
<item title="">
</item>
</body>
</document>
rewrite rules to install in document root:
rewriteengine on
rewriterule ^computing/(.*)\.htm /exec/xmltransform.cgi?section=computing&doc=$1 [L]
rewriterule ^computing/$ /exec/xmltransform.cgi?section=computing [L]
rewriterule ^computing$ /exec/xmltransform.cgi?section=computing [L]
rewriterule ^tag/(.*)$ /exec/tag.cgi?tag=$1 [L]
this is a function to generate random 8chars length passwords from /dev/urandom, alphanumeric chars only. it is made to throw the password to stdout, so you can either pipe it or include it with ``
usage example: . ~/scripts/genpass.sh; while read newmailuser ; do newpass=`genpass` ; ~vpopmail/bin/vadduser $newmailuser $newpass; echo $newmailuser $newpass ; done < ~/feed > /tmp/outout
if you want to use it directly from this website you can do:
echo -e "`wget -O - http://pub.mud.ro/~cia/files/scripts/genpass.sh.txt 2>/dev/null ` \n genpass" | sh
tho POE isn't anymore a event-driven with reusable components framework (was it?) - it is the best framework you can get to do some smart network programming with perl - that after you spent pretty much time to understand it, POE that is. this script is a mass mx resolver used to check for bogus domains within a list that you feed the script with via stdin. code can be slightly modified to do whatever dns checks you want against domains lists. it has a small local cache implemented with a hash table (not so smart) that is checked first because its more likely your domains arent uniqued (do sort/unique the list in bash it will take long time u'll ctrl-c the thing in the end)
you must love perl when it is about text processing, as about quick network programming - love it even more.