Building R2G Tools

I started out building a tool to discover recently dropped domains and in the process domains search engine with a twist. You can check it out at R2G Tools . This was a extremely challenging project, with more than 120 million domains in the world and extremely tight budget. I finally pulled it off last week end. You would be surprised that whois.r2g.in is running off 3 commodity servers (mainly for redundancy, I could have squeezed it into 1 with only slight performance hit).

What’s behind

Database; MongoDB is the main database backend, with MySQL used to hold non domain records. MongoDB was chosen because of it’s ability to handle large amount of objects, schemaless and ability to index keys. Thanks to MongoDB I can search, count and even insert and update records in few seconds (most operations are few milliseconds).

Web application framework; whois.r2g.in is completely built with CodeIgniter. It’s my framework of choice. I like it’s clean and lean architecture.

Servers; I’m running Lighty behind Varnish HTTP accelerator. I also use eaccelerator to optimize PHP code. I also have a memcached instance on the web server to cache data from MySQL. Goal is to speed up whois.r2g.in as much as possible. There are 2 MongoDB instances sharing the load which the application connects. Everything is running off 3 Athlon X2 servers, 2 running MongoDB instances and one running the application.

Why use both MongoDB and MySQL?

That’s because I didn’t want to put all my eggs in one basket. Any data that doesn’t change much like WHOIS servers put into a MySQL table. Information such at WHOIS data were stored in a MongoDB collection.

What I gained?

Lot of experience about data mining, storing and analyzing. I learned a lot about how to optimize data mining, I managed to bring down the time it takes to analyze all the domains for drops to few hours from few days by just pre sorting the zone files. I also gained few rare domains and high page rank domains. I have become a domainer thanks to the project πŸ˜€

Please head over to R2G Tools and give it a try,Β  you might discover a great domain while you are there and make a huge profit. Do not forget to send me some feed back πŸ™‚

Fancy HTTP Error pages – 5xx

If you hadn’t noticed my site was giving HTTP 500 errors last couple of days. The issue was found to be a segfault and it’s fixed now. That got me to come up with a set of funny and slick HTTP error pages. I only came up with HTTP 5xx error pages, I believe HTTP 4xx error pages should be specific to the site. You can download them here. If you want to take a peek, here is the list. HTTP 500, 501, 502, 503, 504, 505. Feel free to modify them and share what you come up with.

Load balanced and High Availability cluster for your web site under USD 60 pm – Part 2

Update 2009-09-02: Now I’m using a single Linode and a Xen VPS from my very own hosting service. This means the VPSes have one less thing in common; hosting company.

As I promised, here is the post that will discuss in detail how I configured my cluster of 2 nodes to host my sites.

Setting up SSH tunnels

You have to setup a SSH tunnel between the nodes. In order to do that you need to allow restricted root logins into your nodes. Using your favourite text editor edit /etc/ssh/sshd_config and change the line PermitRootLogin to PermitRootLogin forced-commands-only.

Then generate SSH authentication keys for all your nodes and add the public keys to /root/.ssh/authorized_keys on other nodes. Keys can be generated by running ssh-keygen. By default your private key is stored in /root/.ssh/id_rsa and public key in /root/.ssh/id_rsa.pub. Your public key will look similar to bellow (Key shortened for brevity)

[source lang='plain' options='toolbar: false; gutter: false;' ]ssh-rsa AAAA...w== [email protected][/source]

To enable tunnel only access via root you need to add tunnel="0",command="/sbin/ifdown tun0;/sbin/ifup tun0" before your public key in /root/.ssh/authorized_keys. Your /root/.ssh/authorized_keys will look something like bellow.

[source lang='plain' options='toolbar: false; gutter: false;' ]tunnel='0',command='/sbin/ifdown tun0;/sbin/ifup tun0' ssh-rsa AAAA...w== [email protected][/source]

Now setup the actual tunnel. Add following lines to /etc/network/interfaces in the “server”

[source lang='plain']
auto tun0
iface tun0 inet static
address 10.100.2.1
netmask 255.255.255.0
pointopoint 10.100.2.2
[/source]

and the following in the “client”

[source lang='plain']
auto tun0
iface tun0 inet static
pre-up ssh -S /var/run/ssh-myvpn-tunnel-control -M -f -w 0:0 example.com true
pre-up sleep 5
address 10.100.2.2
pointopoint 10.100.2.1
netmask 255.255.255.0
up route add -net 10.100.2.0 netmask 255.255.255.0 gw 10.100.2.0 tun0
post-down ssh -S /var/run/ssh-myvpn-tunnel-control -O exit example.com
[/source]

Now you only have to restart networking to enable the tunnel. Now your nodes will be in their own VPN.

Setting up document root replication (rsync)

Share /var/www via rsync. You need to install rsync and add following to /etc/rsyncd.conf if they are not already there.

[source lang='plain']max connections = 2
log file = /var/log/rsync.log
timeout = 300

[www]
comment = DOC Root
path = /var/www
read only = yes
list = yes
uid = www-data
gid = www-data
auth users = replicator
secrets file = /etc/rsyncd.secrets[/source]

Add following cron jobs to www-data crontab (crontab -e)

[source lang='plain' options='gutter: false; toolbar: false;' ]
1/10 * * * * test -r /tmp/rsync.docroot.lock || touch /tmp/rsync.docroot.lock && rsync -aP rsync://[email protected]/www/ /var/www/ --password-file=/etc/rsync.secrets --contimeout=30 > /dev/null 2>1 && rm /tmp/rsync.docroot.lock[/source]

[source lang='plain' options='gutter: false; toolbar: false;' ]
1/10 * * * * test -r /tmp/rsync.docroot.lock || touch /tmp/rsync.docroot.lock && rsync -aP rsync://[email protected]/www/ /var/www/ --password-file=/etc/rsync.secrets --contimeout=30 > /dev/null 2>1 && rm /tmp/rsync.docroot.lock[/source]

Setting up session_mysql

Next let us setup session_mysql such that we can forget about replicating PHP session πŸ™‚ .

Install php5-dev and libmysql++-dev, download session_mysql and extract it, running following commands as root within the extracted location.

[source lang='bash']export PHP_PREFIX='/usr'
$PHP_PREFIX/bin/phpize
./configure --enable-session-mysql --with-php-config=$PHP_PREFIX/bin/php-config --with-mysql=$PHP_PREFIX
make
make install[/source]

Create the database to store the session data with the following SQL

[source lang='sql']
create database phpsession;
grant all privileges on phpsession.* to phpsession identified by 'phpsession'; -- CHANGE DEFAULT PASSWORD
create table phpsession(
sess_key char(64) not null,
sess_mtime int(10) unsigned not null,
sess_host char(64) not null,
sess_val mediumblob not null,

index i_key(sess_key(6)),
index i_mtime(sess_mtime),
index i_host(sess_host)
);[/source]

Add the following to your php.ini (or /etc/php5/conf.d/session_mysql.ini)

[source lang='plain']
session.save_handler = 'mysql'
session_mysql.db='host=localhost db=phpsession user=phpsession pass=phpsession'
[/source]

Do not forget to change the default password. Restart Apache or Lighttpd (or any other web server you are using).

MySQL asynchronous two way replication

I’m sure some of you are asking why I went for asynchronous replication. Main reasons being flexibility and lack of nodes (My cluster is just 2 nodes).

Stop MySQL from listening only to local connections. Remember to review your user table (mysql.user) to make sure you don’t grant wild card access like 'user'@'%'. Comment out bind-address in/etc/mysql/my.cnf in all nodes. Then add following to node1

[source lang='plain']server-id = 1
replicate-same-server-id = 0
auto-increment-increment = 2
auto-increment-offset = 1
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 10
max_binlog_size = 100M

master-host = 10.100.2.2
master-user = slave_user_0
master-password = your$password
master-connect-retry = 60[/source]

and following to node2

[source lang='plain']server-id = 2
replicate-same-server-id = 0
auto-increment-increment = 2
auto-increment-offset = 2
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 10
max_binlog_size = 100M

master-host = 10.100.2.1
master-user = slave_user_1
master-password = your$password
master-connect-retry = 60[/source]

Now create the users only granting them with replication rights. Also make sure you specify the hostname or the IP to make sure someone is not offloading your data πŸ˜€ . Following SQL will create the users given in the example. You will have to run the command in both nodes as the data in either node is identical.

[source lang='sql']CREATE USER 'slave_user_1'@'10.100.2.1' IDENTIFIED BY 'your$password';

GRANT REPLICATION SLAVE ON * . * TO 'slave_user_1'@'10.100.2.1' IDENTIFIED BY 'your$password' WITH MAX_QUERIES_PER_HOUR 0 MAX_CONNECTIONS_PER_HOUR 0 MAX_UPDATES_PER_HOUR 0 MAX_USER_CONNECTIONS 0 ;

CREATE USER 'slave_user_2'@'10.100.2.2' IDENTIFIED BY 'your$password';

GRANT REPLICATION SLAVE ON * . * TO 'slave_user_2'@'10.100.2.2' IDENTIFIED BY 'your$password' WITH MAX_QUERIES_PER_HOUR 0 MAX_CONNECTIONS_PER_HOUR 0 MAX_UPDATES_PER_HOUR 0 MAX_USER_CONNECTIONS 0 ;[/source]

Now start MySQL and run following in mysql prompt on each of the nodes.

[source lang='sql']reset master;
stop slave;
start slave;[/source]

Finally

Now you have a cluster of 2 nodes where you can run your PHP site. Your databases are replicated, your user session data is replicated and your document root is replicated. Have fun, if you have issues please post it as a comment.

Load balanced and High Availability cluster for your web site under USD 60 pm

Update 2009-09-02: Now I’m using a single Linode and a Xen VPS from my very own hosting service. This means the VPSes have one more thing less in common, hosting company.

Until recently I used one Linode VPS for hosting all my sites. On 26th March, there was a DDoS attack on one of the Linode customers in the Fremont Datacenter (where my node was as well). This made my sites inaccessible for couple of hours. This got me thinking, what could be done to mitigate such downtime. Answer of course is having a load balanced and high availability cluster. However I couldn’t afford 2 dedicated servers to do this, but I of course can afford 2 Linodes πŸ™‚ . I’ll try to explain how I set up a load balanced, high availability and shared nothing cluster using Linodes (you can use any VPS or dedicated server). I used two Linode 540 s for the job.

All of my web sites are either using PHP, Python or Perl. All of them are using MySQL as the database. Problems I had to solve were;

  1. replicate files across the nodes
  2. replicate databases across the nodes
  3. replicate session (PHP session variables) across the nodes

All the replication needs to be done securely, so I went for a SSH tunnel between the nodes of the cluster. Over which I’ll;

  1. use rsync to replicate/synchronize the document root
  2. use MySQL asynchronous replication (not a NDBCLUSTER) to synchronize data across the nodes
  3. use session_mysql PECL extension to store PHP session in MySQL database transparent to all applications

Check back next week when I’ll post with configuration examples on how I configured my server. If you are in a hurry above pointers are good enough to get you started.

WSO2 WSF/PHP with Lighttpd

I wanted to test drive WSO2 WSF/PHP on Lighttpd because I couldn’t find any documentation specific for Lighttpd, or any one complaining that it cannot be done. I set up a new VMWare image running Debian so that I can blog all the steps involved in getting WSO2 WSF/PHP working on Lighttpd running on Debian.

Step 1: Install Lighttpd, PHP5

I used apt-get to install Lighttpd and PHP5

 $ sudo apt-get install lighttpd php5 

Step 2: Download and install WSO2 WSF/PHP.

I downloaded the Debian package.

 $ axel -an 5 http://dist.wso2.org/products/wsf/php/wso2-wsf-php-1.2.0-debian.deb
$ dpkg -i wso2-wsf-php-1.2.0-debian.deb

Step 3: Enable WSO2 WSF/PHP

I created a new file /etc/php5/conf.d/wsf.ini and added the following line.

 extension=wsf.so 

Step 4: Enable FastCGI and PHP

Fastest method to run PHP on Lighttpd is FastCGI, so we will be enabling FastCGI.

 $ sudo lighty-enable-mod fastcgi

On Debian Lighttpd FastCGI configuration file contains the configuration for PHP4. We will have to edit /etc/lighttpd/conf-enable/10-fastcgi.conf to look like bellow.

server.modules   += ( "mod_fastcgi" )

## Start an FastCGI server for php5 (needs the php5-cgi package)
fastcgi.server = ( ".php" =>
((
"bin-path" => "/usr/bin/php5-cgi",
"socket" => "/tmp/php.socket",
"max-procs" => 2,
"idle-timeout" => 20,
"bin-environment" => (
"PHP_FCGI_CHILDREN" => "4",
"PHP_FCGI_MAX_REQUESTS" => "10000"
),
"bin-copy-environment" => (
"PATH", "SHELL", "USER"
),
"broken-scriptfilename" => "enable"
))
)

Step 5: Restart Lighttpd

You have to reload the Lighttpd configuration files.

 $ sudo /etc/init.d/lighttpd restart

You have successfully installed WSO2 WSF/PHP on PHP5 and Lighttpd. It is time to test whether it is a success. Simplest approach would be to see phpinfo() page. Create a php file with the following line of code and place it in the document root. Then using a web browser goto that URL. In the page search for wsf section. This contains all the configurations about the WSF extension.

I went throught to the trouble of actually consuming a SOAP web service to see whether this setup actually works and it was a success, but that is simply out of the scope of this post. These instructions should work on other Linux distributions with minor changes and any platform with few changes.

Fly WSO2 WSF/PHP with Lighttpd. Have fun.

Flying light with lighty

I moved all my sites to my all new server. There I’m running Lighttpd as the front facing web server. I do have Apache HTTP Server running for the sake of svn serving. It was not very hard to migrate sites from Apache HTTP Server to Lighttpd. Only feature I missed was .htaccess file support or substitute. I just had to migrate all the operations taking place in the .htaccess files to the Lighty configuration file.

Overall migration was smooth. I have nothing to complain, memory foot print is small as it could get. Since I’m serving only PHP and Python I’m making use of FastCGI and it is really fast. You wouldn’t believe me if I tell you the performance gains. I can serve 700 requests per second when it comes to my Geo-IP web service (I believe the limit was the resources on the test machine), the server is not even sweating. If I was running the same application on Apache HTTP Server it would barely serve 230 requests per second, 204% performance gain.

If you visit any of my sites except for the blog itself (which is hosted at Blogger.com) you would see the performance. mohanjith.net responds within a second, that’s lighting fast. All this with a Debian running on Xen with 128MB physical memory and 256MB swap.

I would recommend Lighty to any one with simple serving requirements. It saves lot of server resources.