301 Redirect Subdomain Forwarding On AWS Route 53

amazon-web-services-logo-large

Recently I decided to ditch registrar DNS Managers like GoDaddy’s in favor of Amazon’s Route53. I really like their console and the DNS is snappy and responsive, not to mention the handy features when creating and pointing to Load Balancers for your TLD. However, one draw back of Route 53 is the lack of subdomain forwarding. Below is a quick solution.

All you will need is an Amazon S3 bucket, the static website end-point, and a Route 53 CNAME.

1.) Create an S3 Bucket: Name the bucket ie: subdomaintoForward.mydomain.com. It has to be the exact subdomain you wish to forward otherwise Route 53 won’t resolve. Choose the region and create!

create-a-bucket

2.) Enable S3 Website Hosting: Open your new bucket and click the “Properties” tab. Open up “Static Website Hosting” than select “Enable website hosting” and make the “Index document” simply index.html. After that, click “Edit Redirection Rules” to open up a text box.

bucket-properties

Paste in the XML below:

This breaks down a regular URL that you want it to forward too. In this example, I want subdomaintoForward.mydomain.com to 301 redirect to http://domaintoforwardto.com/anything/after/thedomain/index5.html – compare this URL to the XML above, you should be able to figure out the syntax.

Click Save!

3.) Create A CNAME in Route 53: Make note of the “Static Website Hosting” End-point, copy that, then open up Amazon Route 53. Select your Hosted Zone and click “Go to Record Sets”. Click “Create Record Set” and configure your Record. In this example, domaintoForward is the subdomain, use CNAEM, and paste in the Static Web Hosting End-point into Value. Create the record.

amazon-route-53-301-redirect

Monokai For Google Chrome – Inspect Element

monokai-chrome-developer-tools

I found an interesting repository that web developers might find useful. Monokai for the Chrome Developer Tools! Honestly, this makes the Inspect Element UI more readable for me. I often find myself staring at the source code for hours until my eyes bleed. Now if only I could remember how to style view-source…

Link to the repository:

https://github.com/bjmatt/monokai-theme-chromedevtools

Transferring WIX Domain To Another Registrar

sad-wix

Googling around, I found this to be buried in random forum posts. I figured I’d document this for future reference.

To transfer your Wix domain from Wix:
1. From your Domain Manager, under My Domains, next to your Wix domain, click Manage.
2. Under Domain Summary, click the Advanced tab.
3. Next to Transfer Away from Wix, click Transfer.

wix-domain-transfer-1

4. In the Transfer window, click Send.

wix-domain-transfer-2

5. An email with a code will be sent to you. Forward this code to your new domain registrar.
You may not receive the email immediately, but you can expect to receive it on the same day.

Source: Dana from WIX Support, Screen Caps: myself.

At this point, you will have to go to a new registrar, such as Namecheap.com and create an account. On the registrar side, you can initiate the domain transfer. Note: Be sure to jot down all DNS Zone information in the WIX Domain Manager panel. Be sure to update the new registrar with DNS information.

How To Push To Github And WordPress.org

I think this is really important and not many people are aware of this. Evan Solomon deved a tool called Scatter. There are a lot of us that favor utilizing Git > SVN, I feel like SVN is stepping backwards in time. Everyone that is current on Git or started out primarily on git may be turned off by SVN. Welp, kick it in the nuts.

http://evansolomon.me/notes/git-wordpress-plugins-and-a-bit-of-sanity-scatter/

Varnish HTTP Accelerator Presentation Notes

varnish-cache-image

Poul-Henning Kamp discusses Varnish HTTP Cache at TYPO3 2010 in Frankfurt. In this post, I will provide detailed notes of his presentation. This is for learning purposes and may contain inconsistencies. I will do my best to keep it sharp.

Varnish HTTP Accelerator Presentation Slides

Notes From Varnish HTTP Cache @ TYPO3 w. Poul-Henning Kamp

Content creation can be automated, you have a master copy and you want to distributed it as much as possible. You really want it to be faster than the Linotype, and a step beyond the Heidelberg Print Machine. Varnish’s Elevator Pitch, “Varnish delivers content faster & reliably, reduces the load on your CMS database, cheap hardware does 100+ kreq/s, can assist in content composition, can fixx stupid mistakes, fast, is Free & Open Source Software, has commercial support.”

After 15 years contributing to the FreeBSD project, Poul-Henning Kamp was approached by VG.no to write a web server. VG.no utilizes a slow CMS which needed a caching system that speeds up the HTTP accelerators. With Varnish, Kamp was able to reduce the use of 12 Squid cache services to a mere 3 Varnish cache servers and greatly reducing the response times.

Poul-Henning Kamp’s goals for Varnish started out simple

  • Varnish is only a HTTP Accelerator.
  • Better configuration.
  • Much Faster
  • Content Management Focused Feature Set

“We don’t do FTP, we do HTTP and we do it damn well,” Kamps continues “This is not a cache to put on the client side, this is a cache to put on the server side. It’s important to understand one thing here, the controlling standard for HTTP is still RFC2616, and if you read it real carefully you will find one place where it mentions a cache on the server side,” “then they realize, actually a cache on the serverside is just another web server.”

Cache on the client size has constraints, you cannot cache per user, private cache, crypto, etc. Varnish as a web server can cache anything we want server side, including crypto!

Understanding Varnish and VCL

Varnish aims to make the configuration process simple in VCL or C language. It’s important to understand the operations of Varnish according to the diagram below.

varnish-state-machine

An example of receive VCL code below.

sub vcl_recv {
if(req.url ~ “\.\.|\.exe”) {
error(999, “Bugger off.”);
}
if(client.ip ~ editor_ip) {
set req.http.x-cms = “no stats”;
return(pass);
}
if(req.url ~ “\.(jpg|png|gif|css)$”) {
unset req.http.cookie;
unset req.http.authenticate;
set req.backend = static_backend;
}
if(req.url == “hotstory.hmtl”) {
set.url = “hotstory.html”);
}
}

Understanding Why Varnish Cache Language Is Epic

  • Compiled to binary/shlib via C-code
    • Runs full speed
  • You can have multiple VCL’s loaded at the same time
    • Switch between them without restart
    • Instantaneous
  • Allows you to do anything you might fancy
    • Inline-C code, ’nuff said.
    • Modules/shlib will make it easier (3.0 feature)

Wikia has broken the 4000+ lines of VCL code and is available online via SVN for all to study.

An example of how Wikia’s VCL code utilizing multiple varnish servers. In this example there are two Varnish servers, one in Germany and one in England. The client is located in Germany attempted to access a server in the US. The client hits the Germany varnish server first, but the German Varnish server tells the client to connect to England Varnish than to the US because England’s pipe to the United States is bigger. This speeds up the load times since Germany’s direct tube to the US is clogged. You can’t just dump stuff on it, it’s a truck.

sub vcl_recv {
if (client.ip == “varnish1”) {
set req.backend = usa;
} else {
set req.backend = england;
}
}

Managing Varnish

Varnish is a daemon process on your computer and utilizing Command Line Interface for real-time control. The Management/Working process split. The manager allows (re)start the worker proc. Allows privilege separation. It also contains multithreading to worker process.

varnish_architecture_shah_anand

Image Source: shah-anand.com (thx bro).

Looking over the Varnish architecture we have one binary program that contains two processes, the Managing Process and Cacher process. The managing process will take your VCL code, send it to the C-compiler, and off to shared object for the Cacher process loads. Varnish Cache also has a cluster control concept that can control 10 different instances of Varnish is separate geolocations. The cluster control is still a concept and is not built out.

One thing to note: Varnish does not write .lock files – it has a shared memory segment, it places lock information and statistics in there and other applications will pull from the shared memory.

Brief CLI Management

Example of CLI Management below (I am a bit lost on this because on param.show I get an error on my instance.

$ telnet localhost 80
Trying 127.0.0.1…
Connected to localhost.
Escape character is ‘^]’
param.show
200 675
default_ttl 120 [seconds]
thread_pools 5 [pools]
thread_pool_max 1500 [threads]
thread_pool_min 1 [threads]
thread_pool_timeout 120 [seconds]
overflow_max 100 [%]
http_workspace 8192 [bytes]
sess_timeout 5 [seconds]
pipe_timeout 60 [seconds]
send_timeout 600 [seconds]
auto_restart on [bool]
[…]

Performance & Speed

Programmed for performance since day one. They wanted to make sure they were aiming for wire speed. Performance is not something you do after the fact, you do it from day one and don’t add things that will bog it down. Writing the code for Varnish aims for today’s complex hardware. This isn’t designed for your dad’s computer anymore!

Use modern features:

  • Virtual Memory
  • sendfile(2), accept_filters(2), kqueue(2)
  • and ever other trick in the book

Performance price list has changed, we can execute 100,000,000+/s per cpu. Disk I/O is behind on response time, stored memory is instead placed in virtual page memory (RAM). Classical logging is horribly expensive. (Examples in slides).

Where does my traffic come from?

Below are a few commands that you can use to access information on a live Varnish instance.

$ varnishtop -i Rxheader
$ varnishtop -i Rxurl
$ varnishhist

Varnishhist is real-time histogram that shows the cache-miss (#) / cache-hit (|) on an x & y axis.
$ varnishstat

Varnishstat pulls real-time statistics from shared memory

Content Management Features

  • Instant action purges/bans (regex or exact match)
  • TTL/Caching policy control in VCL
  • Load/Situation mitigation in VCL
  • Header washing
  • Vary
  • Edge-side-includes (ESI)

27:14

BitTorrent SyncApp Alpha Testing

syncapp

Today, I received my invitation to alpha test BitTorrent SyncApp. I can only imagine the implications this will open up. Although Dropbox is safe, secure, and redundant, how safe is your files? SyncApp is interest, I generate a secret key and designate my “SyncApp” folder. I fire it up my laptop and install SyncApp.exe and punch in my secret key. Immediately, over 2,800+ files in my WordPress localhost is synced to my laptop. This afternoon, it’s synced to a 3rd box. I am interested in taking this to the next step and placing my files on a Linux server in California… with all my private git repositories. I wonder how well git will behave with SyncApp?

I think the most fascinating implication of SyncApp that I can brew in my mind is utilizing it over a Meshnet. Think of tying several Meshnet nodes in Bellingham and giving end users the “secret key” – madness. I wonder if there’s anyone in Bellingham that has already thrown up Meshnet nodes?

The next plan of actions with alpha testing BitTorrent SyncApp will be to tie all of my company files (.psd, .ai, .eps) between 3 computers and potentially a 4th. Perhaps my personal dedicated server down in California? Or maybe I should throw up an experimental Amazon EC2 with block storage? D: OMFG, way to excited.

Bash Script TimThumb Update [cPanel/WHM]

If your dealing with a large WordPress instance, I hope you have shell. Using plugins like Timthumb Vulnerability Scanner on small installations is great, however, on large installations the server might 503.

I had previously used bash scripts to detect outdated TimThumb using simple grep command and outputting the finding to a .txt file which I could cross reference during the update process. It’s become cumbersome to do this, I wanted to grab the updated timthumb version from the Google Code repository and update the files. With a quick Google search, I fould this simple script for cPanel users that can be modified to your distro. Props to DropDeadDick.com for sharing his script. <3 [bash] #! /bin/bash # Detects and updates timthumb.php to latest version for all cPanel users. # dropdeaddick.com latest=`lynx -source http://timthumb.googlecode.com/svn/trunk/timthumb.php |grep "define ('VERSION'" $file |cut -f4 -d"'"` if [ -z "$latest" ]; then echo "could not get latest timthumb release, aborting!" exit 1 fi for user in `awk -F':' '{ if ($3 > 499) print $0 }' /etc/passwd | grep home | cut -d':' -f1`; do for file in `find /home*/$user/public_html/ -type f ( -name 'thumb.php' -o -name 'timthumb.php' ) 2>/dev/null | tr ' ' '%'`; do file=`echo $file | tr '%' ' '` check=`grep -c "code.google.com/p/timthumb" "$file"` if [ -z "$check" ]; then break fi if [ "$check" -gt "0" ]; then version=`grep "define ('VERSION'" "$file" |cut -f4 -d"'"` if [ "$version" != "$latest" ]; then echo -e "e[1;31mWARNING version $versione[0m updating $file!" # rm -f $file #delete current file before replacing. wget -nv -t3 -T3 http://timthumb.googlecode.com/svn/trunk/timthumb.php -O "$file" chown $user: "$file" else echo -e "e[1;32mOK version $versione[0m skipping $file" fi fi done done[/bash] I'd recommend creating an alias so that you can use it periodically. :]

Codecademy Google Chrome Web Store Shortcut

codecademy-chrome-store-shortcut

I am attempting to make it a habit to use Codecademy everyday for brain food. I’d like to work through all the lessons and complete Codecademy’s Code Year Track – I want it front and center everytime I open Google Chrome. I decided to quickly make a Chrome Web Store Shortcut for convenience. If you’d like to use it too, feel free to download it.

How do you add this to Google Chrome? Simple, extract the contents of the .zip file to a nice location (I used Dropbox so I can put it on my other computers). Then go to the Menus icon in the top right corner -> Tools -> Extensions. You will see a button at the top “Load unpacked extensions…” – click that and navigate to the extracted folder and hit Ok. Open a new tab and try it out.

You can use this base template to use it for any other web applications, but remember they are only localized to your computer. If you have a desktop, laptop, and a workstation, you’ll have to manually add it on each computer. Included is a .psd file if you’d like to make your own Chrome Web Store icons! :3

Bonus DLC: Here is a link to the Facebook one I designed. This one looks a lot better than the crappy one currently available in the Chrome Web Store.

How To Scrape Google Cache With A Python Script

I was curious as to how one could scrape Googles Cache to recover a website that was recently taken down. Say for instance, you’re a real estate agent and your website was terminated by your previous hosting company.

Guy Rutenberg wrote a great script in his blog post titled, “Retrieving Google’s Cache for a Whole Website” back in 2008, and has since been revised by curious Python programmers.

The latest revision was done by Thang Pham, available at: https://gist.github.com/3787790 Let’s look over the code real quick.

I fired up an Amazon EC2 instance and placed the python script in ~/python – and allowed the script to run for about an hour. Again, I am not sure if Amazon or Google will rage but eventually Google will block the ip and you’ll get a 503 error. Keep an eye on this so you don’t get it raging. You can always run the script later after the ip block is removed and it will resume where you left off.

TL;DR: On line 19, change the search_site to your target site. Then go to line 48 and change ‘\’ to the destination directory, I used ‘/’

[py]#Retrive old website from Google Cache. Optimized with sleep time, and avoid 504 error (Google block Ip send many request).
#Programmer: Kien Nguyen – QTPros http://qtpros.info/kiennguyen
#change search_site and search_term to match your requirement
#Original: http://www.guyrutenberg.com/2008/10/02/retrieving-googles-cache-for-a-whole-website/

#!/usr/bin/python
import urllib, urllib2
import re
import socket
import os, errno, os.path
import time
import random, math
#import MySQLdb
import imp;

socket.setdefaulttimeout(30)
#adjust the site here
search_site="qtpros.info"
search_term="site:" + search_site

#mysql = imp.load_source("MySQLConnector", "mysql.py").MySQLConnector()
#mysql.connect(‘localhost’,’root’,”,’webscrape’,True)

def mkdir_p(path):
try:
os.makedirs(path)
except OSError as exc: # Python >2.5
if exc.errno == errno.EEXIST:
pass
else: raise

def main():
headers = {‘User-Agent’: ‘Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4′}
url = "http://www.google.com/search?q="+search_term

regex_cache = re.compile(r'<a href="([^"]*)"[^>]*>Cached</a>’)
regex_next = re.compile(‘<a href="([^"]*)"[^>]*><span[^>]*>[^<]*</span><span[^>]*>Next</span></a>’)
regex_url = re.compile(r’search?q=cache:[dw-]+:([^%]*)’)
# regex_title = re.compile(‘<title>([wW]+)</title>’)
# regex_time = re.compile(‘page as it appeared on ([dws:]+)’)
regex_pagenum = re.compile(‘<a href="([^"]*)"[^>]*><span[^>]*>[^<]*</span>([d]+)’)

#this is the directory we will save files to
mkdir_p(search_site)
path = os.path.dirname(os.path.abspath(__file__)) + ‘\’ + search_site
# path = os.path.dirname(os.path.abspath(__file__))
counter = 0
pagenum = int(math.floor(len([name for name in os.listdir(path)]) / 10) + 1)
max_goto = 0;
more = True
if (pagenum > 1):
while (max_goto < pagenum):
req = urllib2.Request(url, None, headers)
page = urllib2.urlopen(req).read()
goto = regex_pagenum.findall(page)
# print goto
for goto_url, goto_pagenum in goto:
goto_pagenum = int(goto_pagenum)
if (goto_pagenum == pagenum):
url = "http://www.google.com" + goto_url.replace(‘&amp;’, ‘&’)
max_goto = pagenum
break
elif (goto_pagenum < pagenum and max_goto < goto_pagenum):
max_goto = goto_pagenum
url = "http://www.google.com" + goto_url.replace(‘&amp;’, ‘&’)
random_interval = random.randrange(5, 20, 1)
print "sleeping for: " + str(random_interval) + " seconds"
print "going to page: " + str(max_goto)
print url
time.sleep(random_interval)

while(more):
#Send search request to google with pre-defined headers
req = urllib2.Request(url, None, headers)
#open the response page
page = urllib2.urlopen(req).read()
#find all cache in the page
matches = regex_cache.findall(page)
#loop through the matches
for match in matches:
counter+=1
#find the url of the page cached by google
the_url = regex_url.findall(match)
the_url = the_url[0]
the_url = the_url.replace(‘http://’, ”)
the_url = the_url.strip(‘/’)
the_url = the_url.replace(‘/’, ‘-‘)
#if href doesn’t start with http insert http before
if not match.startswith("http"):
match = "http:" + match
if (not the_url.endswith(‘html’)):
the_url = the_url + ".html"
#if filename "$url"[.html] does not exists
if not os.path.exists(search_site + "/" + the_url):
tmp_req = urllib2.Request(match.replace(‘&amp;’, ‘&’), None, headers)
try:
tmp_page = urllib2.urlopen(tmp_req).read()
f = open(search_site + "/" + the_url, ‘w’)
f.write(tmp_page)
f.close()
print counter, ": " + the_url
#comment out the code below if you expect to crawl less than 50 pages
random_interval = random.randrange(15, 20, 1)
print "sleeping for: " + str(random_interval) + " seconds"
time.sleep(random_interval)
except urllib2.HTTPError, e:
print ‘Error code: ‘, e.code
pass
#now check if there is more pages
match = regex_next.search(page)
if match == None:
more = False
else:
url = "http://www.google.com"+match.group(1).replace(‘&amp;’, ‘&’)

if __name__=="__main__":
main()[/py]

Thanks Guy Rutenberg and Thang Pham for this great python script! You’re both a life saver!