Saturday, March 31, 2012

How to install counter on any web-application

Some web applications do not have a common template output, to which you could simply insert the tracking code (Google Analytics or Yandex Metrika). In such applications, HTML-pages are generated in different ways and patterns. To insert the tracking code into such a system, you can use the Apache HTTPD.

The logic is as follows:

1) HTML-traffic is caught on-the-fly;

2) In the intercepted text a place to insert the tracking code are looked for. In general, it is "</body>", since the counters are usually inserted before this tag (you can also find "</header>");

3) Found entries are replaced by the counter code;

4) Traffic is sent to the client.

You can intercept traffic by Apache, using it as a proxy. The text can be replaced by Sed utility (there is one for Windows too). In addition, you must ensure that the replacement goes in plain text, not compressed, so the server can compress traffic internally. Before replacing you the need to extract the traffic. Next, when you perform the replacing, you need to compress it again.

You can use the following Apache settings:

Listen *:80
Listen 127.0.0.1:20001

<VirtualHost 127.0.0.1:20001>
  # Integration with application
  ProxyPreserveHost On
  ProxyPass / ajp://127.0.0.1:8009/ min=5 ttl=120 keepalive=On ping=1
  ProxyPassReverse / ajp://127.0.0.1:8009/
  
  # Filter to replace text
  ExtFilterDefine metrika mode=output \
    cmd="C:/apache_scripts/metrika_filter.cmd" \
    intype=text/html
  
  # Unpacking the traffic and the replacement
  AddOutputFilterByType INFLATE;metrika text/html
</VirtualHost>

<VirtualHost *:80>
  # Proxy for the compression of the resulting traffic
  ProxyPreserveHost On
  ProxyPass / http://127.0.0.1:20001/ min=5 ttl=120 keepalive=On ping=1
  ProxyPassReverse / http://127.0.0.1:20001/
  
  # Compression of html-traffic
  AddOutputFilterByType DEFLATE text/html
</VirtualHost>

This creates two virtual hosts, one catches the other thread. Theoretically, one could apply the filters in the correct order, but in practice it can not be done. By default, initially decompression is performed, then compression, and then replacement. Of course, you need a different order. Use the module mod_filter with its filter chaining mechanism does not help, since it does not interact with the filters installed by ExtFilterDefine (may be due to the bugs in the implementation of modules for apache 2.2).

Code of metrika_filter.cmd:

@echo off
set _command="s/<\/body>/<!-- Yandex.Metrika counter --><script type='text\/javascript'>(function (d, w, c) { (w[c] = w[c] || []).push(function() { try { w.yaCounterXXXXXX = new Ya.Metrika({id:XXXXXX, enableAll: true, ut:'noindex'}); } catch(e) {} }); var n = d.getElementsByTagName('script')[0], s = d.createElement('script'), f = function () { n.parentNode.insertBefore(s, n); }; s.type = 'text\/javascript'; s.async = true; s.src = (d.location.protocol == 'https:' ? 'https:' : 'http:') + '\/\/mc.yandex.ru\/metrika\/watch.js'; if (w.opera == '[object Opera]') { d.addEventListener('DOMContentLoaded', f); } else { f(); } })(document, window, 'yandex_metrika_callbacks');<\/script><noscript><div><img src='\/\/mc.yandex.ru\/watch\/XXXXXX?ut=noindex' style='position:absolute; left:-9999px;' alt='' \/><\/div><\/noscript><!-- \/Yandex.Metrika counter --><\/body>/gi"
sed %_command%

In any case, this solution is not suitable for continuous use and can be applied only in a limited period to identify the server load, and to get the statistics.

0 comments:

Post a Comment