DirectoryScheme with Ruby

A while ago I needed to write some code that wrote a full directory tree and auto-wrote files, much like the Rails Scaffolding helper. 

I didn't like the way that Ruby handled writing Directories, it seemed like a lot of effort to write a Directory tree since you would have to create a Hash of tree nodes and then iterate and build the directory as you went, this is fine, but if you have a deeply nested structure it is easy to provide strings to create deeper mappings. 

Here is the code and usecase

The usage example is fairly simple, but gives you an idea of what you could do with this knowledge.

Happy Hacking!

Build an External Solid State Drive for under $300 bucks

(download)

I was looking online to see when the first external SSD (solid state drives) would be coming to market. They are here, but if you are looking for an external SSD with Firewire 800, then that will cost you between $500 - $1000. ( Ex: http://www.memory4less.com/m4l_itemdetail.aspx?partno=909200-01&itemid=14... )

However, if you think about it, you can build an external solid state drive with firewire 800 / firewire 400 / usb 2.0 for under $300. 

 

What you need. 

1. Intel SATA Solid-State Drive (320 Series - 120GB : $180 )

2. External 2.5inch Enclosure with Firewire 800/400/USB2.0 - Acomdata TangoPro ( $52 )

3. Mini Screwdriver ( Tiny Turner ~$5 )

So the parts are a little over $240 with tax, but within 10 minutes or less, you'll have a portable SSD drive.

 

Note: The Acomdata TangoPro doesn't come with a 5V DC Adaptor, but if you're like me, you probably have a drawer full of random adaptors from past drives, and computer periferals. 

 

That's that, now you can bring your SSD with you.

Prime Numbers Case Study. Number Theory vs Brute Force

I find it very interesting that numbers, generally speaking, seem like such simple entities. eg. Numbers help you represent items in sets, groups, and collections, and over the years have helped humans evolve from a primitive to highly evolved species . Numbers can be viewed as simple objects, but there is a whole world of hidden complexities that lie in the heart of Number Theory. 

During the last month, I came across a very interesting bit of Number Theory, an algorithm for calculating Prime Numbers called The Prime Number Sieve of Eratosthenes. Erathosthenes was an ancient Greek mathematician. 

The concept of the algorithm is simple. If you want to find all prime numbers below a limit - say 2 million - Starting with 2, any factor of two up to 2 million can't be Prime, moving on, each factor of any Prime Number can't be prime. 

So starting with 2, upto 200, you knock out 100 (200/2=100), at 3 to 200, you have 66 factors of 3 in the set, at this point 4 is knocked out since 4/2 = 2 = prime, at 5 you have an additional 40 factors in the set of 200. Where there is a factor of a prime number, the factor can't be prime. 

The Sieve Algorithm

In computing, a sieve is a collection of values, generally in a dictionary or array. You initialize your sieve to be true for all members in the Range you want to check, for this example, is a number prime. 

 limit = 200
 limitn = limit+1
primes = []
 (0..limitn).each{|val| primes[val] = true} 

Once you have a sieve to work with. You can use number theory and Ruby to find your prime numbers.

 primes[0],primes[1] = false 
primes.each_with_index do |prime,i| 
 unless i > 2 
  range = Range.new(i,limitn)
   range.step(i) do |index| 
   primes[index] = false
   end
  end
 end 

At this point, the Array index of primes where the value in the Array is true is a Prime Number.

 [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199] 

Did I mention that this saves a ton, I mean a ton of CPU time? It literally saves you hours and days of computation, since you don't have an O(n**2) relationship where you would with the Brute Force alternative. Typically if you want to see if a number is prime, you ensure that it is divisible by 1 and itself only. This means you have to calculate every number up to the number you are testing for Prime. For 2000, you have to test 2000-2 times to check if 2000 is a prime number.

Full Source Code with Benchmarking Scripts

Prime Sieve Code on Github.

 #!/usr/bin/env ruby    # Primary Numbers using Number Theory  # Sieve of Eratosthenese    # Using Array  # Fastest  def prime_sieve(limit)     limitn = limit+1   primes = []     # Fill in prime truth table   for i in 0..limitn do    primes[i] = true   end     primes[0] = false   primes[1] = false     primes.each_with_index{|prime,i|    unless i < 2     range = Range.new(i*i,limitn)     range.step(i) {|index| primes[index] = false}    end   }     true_primes = []   primes.each_with_index{|value,i|    true_primes << i if value == true   }     return true_primes    end    # Slower  def prime_sieve_hash(limit)   limitn = limit+1   primes = {}     #populate hash with num:true   (2..limitn).each{|val|    primes[val] = true   }     # iterate through primes dictionary   primes.each{|k,v|    # we know that for every number (starting with 2)    # that every factor of 2 upto the limitn, will not be a prime number    range = Range.new(k*k,limitn)    range.step(k) {|index| primes[index] = false }   }     # create primes container   true_prime = []     # collect prime   primes.each{|k,v|    true_prime << k if v === true   }     return true_prime    end    # Traditional Brute Force Prime Number Generator    def brute_force_primes(limit)     primes = []   complexity = 0   # we know that 0,1 can't be prime   # start at 2 and create a Range upto the limit   (2..limit).each{|number|    complexity += 1    is_prime = true      # any number is divisible by 1    # so start at 2    (2..number).each{|n|     complexity += 1     # ensure the number being tested is not the number in the loop     unless number == n      # continue unless the local loop value of n is a factor of number      unless number % n != 0       is_prime = false       break      end     end    }      primes << number if is_prime   }   puts "O(n**2): #{(limit**2).to_s}"   puts "Complexity: #{complexity.to_s}"   return primes    end    def benchmark &block   tstart = Time.now   block.call   tend = Time.now   puts "Running Time: #{(tend.to_f - tstart.to_f).to_s}"  end  

Using the code above. You can run tests on number sets or just collect the actual prime numbers.

Usage:

 p = prime_sieve(200)  puts p.inspect    primes = prime_sieve_hash(200)  puts primes.inspect    primes2 = brute_force_primes(200)  puts primes2.inspect    puts "\n\n\n"  puts "PRIMES SIEVE: "  benchmark{ prime_sieve(900000) }    puts "\n\n\n"  puts "PRIMES SIEVE HASH: "  benchmark{ prime_sieve_hash(265000) }    puts "\n\n\n"  puts "BRUTE FORCE: "  benchmark{ brute_force_primes(16200) }  

You may have noticed that I populated the benchmarking tests with different values. When this runs, you will find all of the benchmarking times to be about the same. I found it interesting to see that in the same time it takes Brute Force to calculate 16,200 Prime Numbers, we can find 900,000 with the Sieve of Eratosthenese.

Happy Hacking

How to build a Universal Unique Identifier

Universally Unique Identifiers (UUID) are common when building any sort of distributed system. More often than not, you will find the need to use UUIDs when building any sort of RESTful API. Since REST (Representational State Transfer) relies on the State of a particular Event. For example, ( /api/user/423455/status ) could be used to capture the State of a user's registration in your system ( unverified, verified, paid up, deadbeat... etc ) you get the picture.

Now, say for example, you are doing something that has the need for REST, not just CRUD (create (post), read (get), update (put), destroy (delete)), say your building a video transcoding api (like zencoder ). You would want to be able to update a user on the Status of a particular video's process. Here is where a UUID would come in handy.

For example: ( /api/transcoder/video/uuid/status ), could return a json string

{"video": 

  {"format": { "out_type" : "mov" , "vcodec" : "h264" , "acodec" : "AAC" }},

  {"status" : { "message" : "processing", "percent_complete" : 65}

}

This would allow you to poll the server at given intervals, and allow you to update your GUI to represent the current percentage of that particular video, out of say millions of different videos.

Now, let's take a look at the code to make a UUID.

source at github. universal_unique

 #!/usr/bin/env ruby
 require 'digest/md5'
 
 module Crypto
  module Universal
 
   class Unique
 
    attr_accessor :unique
 
    # generate unique salt
    def initialize(salt=nil,params={})
     @salt = salt.nil? ? generate_salt : salt
     generate_randoms
     self.unique = generate_unique
    end
 
    # generate random alpha numerics
    def generate_randoms
     @alphas = ("a".."z").collect{|alpha| alpha }
     @numerics = (1..9).collect{|num| num }
    end
 
    def generate_random_alpha_numeric
     tmp = rand(2)
     if tmp == 0
      @val = @alphas[rand(@alphas.size)]
     else
      @val = @numerics[rand(@numerics.size)]
     end
     #puts @val.to_s
     return  @val
    end
 
    def generate_unique
     #x is hexidecimal
     yvals = [8,9,"A","B"]
     # ........ - .... - 4... - yvals(rand(yvals.length))...-............
     chunks = 1.upto(8).collect{|i| generate_random_alpha_numeric.to_s }.join("")
     chunks << "-"
     chunks << 1.upto(4).collect{|i| generate_random_alpha_numeric.to_s }.join("")
     chunks << "-4"
     chunks << 1.upto(3).collect{|i| generate_random_alpha_numeric.to_s }.join("")
     chunks << "-#{yvals[rand(yvals.size)]}"
     chunks << 1.upto(3).collect{|i| generate_random_alpha_numeric.to_s }.join("")
     chunks << "-"
     chunks << 1.upto(12).collect{|i| generate_random_alpha_numeric.to_s }.join("")
     self.unique = chunks
    end
 
    def generate_salt
     return Digest::MD5.hexdigest(rand(9999).to_s)
    end
 
   end
 
  end
 end
 

There you have it. You can now easily generate UUIDs. This is using the version 4 method for the Universal Unique ID algorithm.

 require 'lib/crypto/universal/unique'
 unique = Crypto::Universal::Unique.new().unique
 puts unique
 

This will return the following style UUID: 3qdsrgir-cd7l-43nl-8vk9-23t6suw57rs3

Happy Hacking

 

 

 

Speaking at Silicon Valley Code Camp 2011

WebSocket I/O with Ruby EventMachine

I will be speaking on either October 8th, or 9th 2011. Dates, Times are to be released soon

WebSocket I/O with Ruby EventMachine

Overview

In this session, I will cover using Ruby EventMachine to create a real-time - event driven - web application. This session will cover setting up an EventMachine server, Registering endpoints (unique websocket connections), and handling live or autonomous communication.

Session will cover following ruby technologies

 

  1. RVM - using rvm gemsets with .rvmrc RVM
     triggering use of custom rvm gemsets from within github repo
  2. Gemfile (for bundle install) (gem install bundler)
  3. Ruby EventMachine
  4. em-websocket, em-http-request gems
  5. JSON object serialization
  6. Concurrent Programming in Ruby
  7. Autonomous Communication

 

Session Application

During the course of this session, we will build a real-time, multi-threaded communications platform that can connect to any socket based endpoint, including another user, or HTTP API, or custom endpoint, or protocol.

 

  1. Utilize html5 websockets and Json messaging to build an event-driven communications platform
  2. Drive socket based communication utilizing the Observer and Reactor patterns
  3. Use blocks (&block) to deliver http responses - from any 3rd party rest API - from your eventmachine server, back down through a websocket to the user who generated the request, or on time-based cycles.
  4. Learn how to use Modules (mix-ins)
  5. Build configuration files with YAML
  6. Accept Command-Line arguments using ARGV
  7. Have fun with other Ruby geeks!

 

Please come to Silicon Valley Code Camp 2011, it is Free, and a great way to learn new things from some of the smartest engineers in the Silicon Valley

CodeCamp at FootHill College.

Binary Search and Linear Search algorithms

Binary Search

Binary Search algorithms allow us to quickly iterate over large ranges to find whether or not a value exists. This is useful for seeing if a value of say 100 million exists in a specified range of 0 to 2 billion. The reason why this is useful is because we are actually doing a bisectional search.

Bisectional searching means that we are taking our entire range (for each search) and dividing it in half. We can then test to see if our search value exists in the last position of the first half of our search, or in the first position of the second half of our search. In essence, we could throw out 1 billion useless tests in a range of 2 billion. (This as you may guess significantly reduces the overhead of our search).

So let's look at the code

  def binary_search(range,value)
  # binary search algorithm
  # chop range in half, test if value is greater, less then chunk[0], chunk[1]
  # continue test on chunk that passes condition, until 2 values left
  $loops += 1 # adds about .001 more time to algorithm
  #@range = range.each.map{|val| val} # adds about 50% more time to algorithm
  @range = range.to_a
  #puts "Size of range: #{@range.size.to_s}"
  unless @range.size <= 2
   # chop range in half
  # test if value is in first or second chunk
   ranges = []
   ranges << @range[0,(@range.size/2)]
   ranges << @range[(@range.size/2),@range.size]
   #puts "size of ranges[0]
 #{ranges[0].size.to_s}"
   #puts "size of ranges[1]
 #{ranges[1].size.to_s}"
   # test if value is less than or equal to first value in ranges[1] or last value of ranges[0]
   if value >= ranges[1][0]
    #puts "#{value.to_s} should be searched for in ranges[1]"
    #puts ranges[1].inspect
    @upper = ranges[1].last
    @lower = ranges[1].first
   else
    #puts "#{value.to_s} should be searched for in ranges[0]"
    @upper = ranges[0].last
    @lower = ranges[0].first
   end
   #puts "Check Range of #{@lower.to_s} .. #{@upper.to_s}"
   #puts "Loop #{$loops.to_s}"
   # recursivly check against new parameters
   binary_search((Range.new(@lower,@upper)),value)
  else
   # we have two values left, do comparison
   puts "search #{@range.inspect} for #{value.to_s}"
   puts "range only has 2 values in it, manually compare against value"
   if @range.first == value
    puts "value is in set: #{@range.first.to_s} == #{value.to_s}"
    return true
   elsif @range.last == value
    puts "value is in set: #{@range.last.to_s} == #{value.to_s}"
    return true
   else
    puts "value is not in the set: #{value.to_s}"
    return false
   end
  end
 end 

So, in the code above (binary_search.rb source on github.com), we take a Range of Integers as our first parameter, and take a value to search against as our second parameter. We start off by turning our Range into an Array so we can test the values within it. Next, we assume that we need to continue cutting our Search Array in half until we have 2 or less values left in the Array.

If we have more than 2 elements in our Array, we divide our Array in half, and check if the last Array element in the first half of our bisected Array is less than our search value, if not, we can assume that our value is in the second half of our new Array. (ranges[0] or ranges[1]). Next we recursively test the range where we presume our search will result in a match, and we continue to break down our search until we only have 2 elements left in our search Array.

Once we only have 2 elements left in our search, we can manually search the first and last element within that Array, and we either have a match, or we can successful return false because there is nothing left to test.

This is huge. We can literally reduce the overhead of each test logarithmically, meaning we can cut off 1/2 the search each time we iterate through our loop.

Linear Search

Linear search algorithms are necessary when we have to iterate over every element in our Range, however, where we can use Binary Search (bisectional search algorithms), we should use the later.

Linear algorithms in search must iterate through every element in the Range leading up to a successful match, or it should return false.

Let's look at the code

 def linear_search(range,value)
  @range = range.to_a
  # iterate through start to end of range
  @range.each do |comparison|
   # return true if we have a match
   unless comparison != value
    $tests += 1
    #puts "Total Tests: #{$tests.to_s}. matched: #{comparison.to_s} == #{value.to_s}"
    return true
   else
    $tests += 1
    #puts "Total Tests: #{$tests.to_s}. No Match. test again"
   end
  end
 return false
 end   

As you can see above, the logic for linear search (Linear Search algorithm on Github) is much less complex. All we do is loop through our Range from start to finish, or until we find a match. This is nice if our search element is at the beginning of our Range, however, we are dealing with 50% more overhead than we would with our Binary Search algorithm, since we are not dealing with a linear algorithm vs a logarithmic algorithm.

Test Results running in Ruby 1.9.2 for Range 0 - 20 million, search target of 1000

binary_search: Total Time: 2.3716108798980713 seconds

linear_search: Total Time: 3.402621030807495 seconds

Test Results running in Ruby 1.9.2 for Range 0 - 200 million, search target of 200 million

binary_search: Total Time: 2.3795228004455566 seconds

linear_search: Total Time: 3.4161128997802734 seconds

Test Results running in Ruby 1.9.2 for Range 0 - 2 billion, search target of 200 million

binary_search: Total Time: 23.50186014175415 seconds

linear_search: Total Time: 13.934978246688843 seconds

Note: Ruby's internal each iterator is fast. Depending on computer performance, the linear search algorithm can keep up if not surpass the binary search algorithm, however, we can shed seconds

Finding Prime Factors of a Gigantic Composite Number

For the past few months I've been trying to solve the word problems on the Project Euler (pronounced Oiler, so I found out today!), but these problems will truly prep you for thinking outside the box. Problem #3 required you to find the largest prime factor of a composite number (600,851,475,143).

The Breakdown

So let's break it down. What are they asking here?

First off these guys are trying to screw with you, but all they are really asking is what prime numbers are evenly divisible by the main composite number. (From the site: ) The prime factors of 13195 are 5, 7, 13 and 29. 13195 is evenly divisible by 5,7,13,29 which results in following (after running through my algorithm).

1 Wave (TEST_NUMBER: 13195
RESULTS SET: [2639.0, 1885.0, 1015.0, 455.0]
Prime Factors: [5, 7, 13, 29]

What is really cool about prime factors of a composite number? Well if you multiply each prime factor by themselves (eg. 5*7*13*29) you get the composite number of 13195.

So if we can solve for 13195, why not solve for a number over 600 billion? This was the challenge. I didn't know how many prime numbers I would need to include in my @primes Array, so I did what anyone would do and decided to find a number that when squared would be around 600 billion, my calculations were off, but that didn't matter, I chose to grab all the prime numbers up to 775,000. (600,851,475,143 / 775,000) = 775,292.226 - close enough.

Calculating the Primes

So I took different approach to collecting my data for this equation. Sure i started off looking online to try and find a nice csv file with all the prime numbers I wanted, but unfortunately most I could find were up to 1000, and that isn't nearly enough. So I wrote the code to find the prime numbers.

gather_prime_numbers.rb source code
important piece of the puzzle (snippet)

  def check_for_prime(current,total)
  divisible_by_self = false
  divisible_by_one  = false
  divisible_by_none_other = true
  for j in (1..total) do
   if (current.to_f % j.to_f) == 0
    if j == 1
     divisible_by_one = true
    elsif j == current && current != 1
    divisible_by_self = true
   else
     divisible_by_none_other = false
   end
  end
  end
 # prime numbers are only divisible by 1 and themselves, so @divisible_by_none_other.should == true
 if divisible_by_self == true && divisible_by_one == true && divisible_by_none_other == true
 #@primes << current
  return current
 else
  return nil
  end
 end
 

So what our little algorithm is doing is this, it is testing that each number in our set is only divisible by itself and one, and returning the number as a prime number or as a nil.

 

So where does this get us? If you have nothing but time on your hands and a great solid state macpro with 8GB RAM (I'm smiling cause the company gave me just such a computer) then have at it. Run the following code from the command line using the gather_prime_numbers.rb code:

> ruby gather_prime_numbers.rb 0 200
--------------------------------------------

  Start 0 - LIMIT 200
 Current: 100  Primes: 0
 PRIMES:  [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]
 Benchmark: 0.003527  Ran 1 Times
 Current: 200
 Primes: 25
 PRIMES:  [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199]
 Benchmark: 0.006756
 Ran 2 Times
 
--------------------------------------------

Now for fun. Let's raise the stakes

> ruby gather_prime_numbers.rb 0 45000
(* note to self: You will reach 99% CPU overhead, vs a fair 4% or less of RAM consumption)
When the application starts to cool down you'll be right around the 450 run times block.

Benchmarking Data

Benchmark: 1.495103 (s) (Current Completed cycle of 1000 numbers tested, filtered down to just the primes)
Operation Time: 341.3632109999999 (s) (total time since the algorithm started running)
Ran 449 Times (number of cycles of 1000 numbers checked to find the presence of prime numbers)
Current: 45000 (last number we ended on)
Primes: 4665 (total primes in the set from 0 to 45000

Note: As the primes get larger the time to calculate goes up parabolically. The number of calculations per prime number test is n*n-1 (ex: find prime number of 4500, it takes 20,245,500 calculations (4500*4499)). So it is truly incredible when you think about the numbers up towards the 750,000+. (750,000 * 749,999 = 562,499,250,000 calculations per test), since we are running sets of 1000, we end up churning out 5.62e14 calculations per set of 1000. So after a million, you'd want to be running parallel sets, or using a compiled language. 

What are we getting at?

Over the course of a weekend. I let this algorithm just trudge along and after 3 days of my macpro churning at 99% CPU, I had finally gotten the data set I needed, that being the prime numbers up to 750,000. (Note* 750,000 was easier for me since in theory since in the first example (composite = 13195, we only needed to go up to 29), so the rest of the numbers leading up to the full 775,000, would have only really provided an extra 25,000 numbers to test for primes for this main set. My logic here is that there was an average of around 60 primes per set of 1000, so we are looking at only around 1,500 of so lost prime candidates.

Lucky for you guys, I did all the heavy lifting and now you can just take it from here.
Prime Numbers up to 750,000 in CSV format

Well we've gone on over and broken down the word problem into its simple parts. We know that we need 1.) A set of prime numbers as potential prime factors for our target composite number. 2.) We also know that in order to have a set of prime numbers we must have data to work with (which we have now)

Solution

solution files on github

(Spoiler Alert) Solution to Project Euler # 3

  #!/usr/bin/ruby
 # PROBLEM 3 - PROJECT EULER
 
 # grab csv file into Array chunks


File.open(File.dirname(__FILE__)+"/primes_to_750000.csv") do |f|


 @f = f.readlines
 end
 
 # strip white spaces
 @primes = @f[0].split(',')
 @primes.collect! do |n|
  n.gsub(/ /,'').to_i
 end
 #puts @primes.inspect
 @count = 0
 
 # grab results and return to calling method
 def run_recursive_loop &block
  results = yield
  return results
 end
 
 # main logic, tests for prime factors
 def divide_and_find_prime_results(start_number,info_set)
  @next_results = []
  @factors = []
  # try to divide start_number by all members of info_set
  info_set.each {|prime|
  @tmp_result = (start_number.to_f/prime.to_f)
  #puts "START NUMBER: #{start_number.to_s} / PRIME: #{prime.to_s} #{@tmp_result.to_s}"
  if (@tmp_result % 2) == 1
   #puts "FINALLY: #{prime.to_s}"
   @next_results << @tmp_result
   @factors << prime
  end
  }
  @count += 1
  return {"set"=>@next_results,"prime_factors"=>@factors}
 end
 
 # used to display information on each set of tests against the target composite number
 def run_til_none_left_to_run(test_number, start_set)
  r = run_recursive_loop { divide_and_find_prime_results(test_number,start_set) }
  puts "#{@count.to_s} Wave TEST_NUMBER: #{test_number.to_s}\nRESULTS SET: #{r['set'].inspect}\nPrime Factors: #{r['prime_factors'].inspect}"
 end
 
 @cap = 600851475143 #13195
 
 def benchmark(&block)
  start_benchmark = Time.now
  yield
  end_benchmark = Time.now
  t = end_benchmark - start_benchmark
  puts "TOTAL TIME: #{t.to_s}"
 end
 
 benchmark{ run_til_none_left_to_run(@cap,@primes) }
 

 

So at this point you are seconds away from figuring out the actual meaning behind project oilers (eulers) problem. > prime_factors_3.rb
and your off.

I do these series because it helps me with my own understanding of mathematics, if you or anyone else has a better trick, or can figure out how to cut the cpu overhead from my prime number calculator, I'd be thrilled.

Happy Hacking

RTP (Real-Time Transport Protocol) with Ruby

RTP History

Audio and Video networking is what enables voice-over-IP (VOIP), telephony, teleconferencing, streaming video and webcasting. The quest for sending audio and video data over a packet network began in the early 1970s, despite all the buzz over GTalk and Apple's Facetime today, the quest to get here has been a 40 year trip.

For more information on History, Packetization, and RTP you have to check out RTP: Audio and Video for the Internet, the book changed my understanding of the logic behind streaming audio/video/data)

RTP and Ruby

When I first started working with RTP, I admit I was a little giddy, since the idea of being able to create homegrown streaming systems was this unexplored frontier for me. Here was my strategy for understanding enough to get going, and some tips for you if your new to the space, or new to RTP with Ruby.

Tip #1. Read a good book on the RTP protocol

It is very easy to tell yourself that there is no reason to buy a book when there is a wealth of knowledge online, but just bite the bullet and get one. The only real alternative is the read the RFCs (http://www.ietf.org/rfc/rfc1889.txt), but unless you have a good degree in Computer Science, and are used to reading technology white papers, your're in for some more hair loss.

Tip #2. Wireshark is your friend

Wireshark is a fantastic tool for doing real-time, packet capture on your system. It is fairly easy to use, you just have to know what data you are looking for.

I was working on capturing SIP packets over UDP, so I could do the following: Capture Interfaces > eth0 > filter options (UDP only) > Start.

Once Wireshark was streaming, I would use my Xlite Sip Phone (X-Lite) using the free sip registrar Sip2Sip.info. I would fire up my softphone, call myself from another computer using the same phone (different sip username), and could capture the sequence of sip messages, followed by the video/audio over RTP/UDP. This was invaluable, cause without data that you can relate to, it becomes a taxing process when your analyzing the same data with ruby.

What I would do with Ruby was more proof of concept development that allowed me to build a threaded SIP UserAgent with EventMachine and Sinatra.

Example #1. Parsing RTP Packets with Ruby

Without going into detail about how we get these RTP packets, I will explain how to parse the headers and look for RTP packets that are either H263,H263-1998, or H264 video packets. (As for getting the packets, you can use wireshark, or setup a telephony server (Freeswitch / Asterisk) and overide the SDP in your call session to pass RTP packets back to an instance of EventMachine (or Python Twisted) or a Simple UDP Socket Server, to capture the RTP packets / Data Payload to parse with this script.

Script available on GitHub. https://github.com/newfront/dailyhacking/blob/master/media/rtp_header_parser.rb

it's worth noting at this time that all of the code we've talked about today is all Ruby 1.9.2+. If you are still working with ruby 1.8 you are going to have to update the lines method for the string prior being transformed to array. For example, "808032243773400789898".lines.to_a works to turn this Sting into an Array, however if you are not using Ruby 1.9.2 you will notice that the interpreter will fail. =

Unix Tricks to ease your stress levels

I am currently working on a project that required downloading tons of separate tar.gz files. Most were patch files to get a server up to speed.

I just wanted to share this quick trick, since it saves you the hassle of having to tar each file individually

Untarring an entire directory of Tar files

Move to the directory with the files you want to untar. (cd /path/to/source)

 for file in *.tar.gz
 > do
 > tar -xvzf $file
 > done
 

That is it. 4 lines that could save you time

Happy Hacking