Archive for the 'Binary Searches' Category

PHP OOP & Algorithms I: An Introduction

quadAvoiding the Misinformed

Programmers often spend more time un-doing bad information than using good information. One of the comments that set me off recently was someone “explaining” to others on a blog why PHP was not an object oriented language. Then he continued to blather on about the difference between compiled and interpreted languages. Whether or not a language is compiled or not has nothing to do with whether or not it is an object oriented language. Having interfaces, classes and communication between objects are the key criteria of an OOP language, and certainly since PHP5 has been a full-fledged OOP language. (We PHPers should not feel singled out because I recently saw post where a Java programmer pronounced that neither Python nor Perl were OOP, and she was “informed” otherwise by irate Python programmers. Perl has been OOP since V5.) So here I am again wasting time grumbling about people who don’t know what they’re talking about.

Instead of frothing at the mouth over the misinformed, I decided to spent more time with the well-informed. To renew my acquaintance with algorithms I began reading Algorithms 4th Ed. (2011) by Sedgewick and Wayne. Quickly, I learned some very basic truths about algorithms that had been only vaguely floating around in my head. First and foremost are the following:

Bad programmers worry about the code.
Good programmers worry about data structures and their relationships.
Linus Torvalds (Creator of Linux)

Since we’ve been spending time on this blog acting like good programmers, that was reassuring. In this post, I’d like to look at two things that are important for developing algorithms: 1) What to count as a “cost” in developing algorithms, and 2) Identifying good and bad algorithmic models. First, though, play and download the example. Using two different algorithms, a logarithmic and a linear (both pretty good ones), I’ve added “dots” to loop iterations to visually demonstrate the difference between a logarithmic algorithm (binary search) and a linear algorithm (loop). The “expense” of the algorithm can be seen in the number of dots generated.
PlayDownload

The example is a pretty simple one. However, since this blog is about PHP Design Patterns, I added a little Factory Method. The two algorithm classes act like clients making requests through the factory for a big string array with over 1,000 first names. Figure 1 shows the file diagram:

Figure 1: File diagram for use of Factory Method by two algorithm clients.

Figure 1: File diagram for use of Factory Method by two algorithm clients.

In looking at the file diagram, you may be thinking, “Why didn’t you use a Strategy pattern coupled with that Factory Method?” I thought about it, but then decided you could do it yourself. (Why should I have all the fun?)

Lesson 1: Leave the Bag of Pennies

The first lesson I learned in Bank Robbery 101 was to leave the bag of pennies. They’re just not worth it. Speed is everything in a bank robbery, and so you pay attention to how to get the most with the least time. The same thing applies to analyzing algorithms. For example, an object (as compared to an integer, boolean or string) has an overhead of 16 bytes. I have actually seen posts intoning, “objects are expensive…” Just to be clear,

Objects are not expensive. Iterations are expensive, quadratic algorithms are expensive.

In evaluating an algorithm you need to see how many operations must be completed or the size and nature of the N. An N made of up strings is different than an N made up of Booleans or integers. A quadratic (N²) and cubic (N³) algorithm are among the worst. They’re gobbling up kilo- or megabytes, and so that 16 bytes seems pretty silly to worry about. So instead of seeing an algorithm weight expressed as N² + 84 bytes, you’ll just see it expressed as ~N². (When you see a ~ (tilde) in an algorithm, it denotes ‘approximately.’) Another way of understanding the ~ is to note, They left the bag of pennies.

Lesson 2: Watch out for Nested Loops; they’re Quadratic!

I’ve never liked nested loops, and while I admit that I’ve used them before, I just didn’t like them. They were hard to unwind and refactor, and they always seemed to put a hiccup in the run-time. Now I know why I don’t like them; they’re quadratic.

Quadratic algorithms have the following feature: When the N doubles, the running time increases fourfold.

An easy way to understand the problem with quadradics is to consider a simple matrix or table. Suppose you start with a table of 5 rows and 5 columns. You would create 5² cells—25 cells. Now if you double the number to 10, 10² cells = 100. That’s 4 x 25. Double that 10 to 20 and your have 20² or 400. A nested loop has that same quality as your N increases. If both your inner and outer loop N increases, you’re heading for a massive slowdown.

Algorithms, OOP and Design Patterns are Mutually Exclusive

An important fact to remember is that good algorithms do not guarantee good OOP. Likewise, good OOP does not mean good algorithms. Good algorithms make your code execute more efficiently and effectively. Good OOP makes your programs easier to reuse, update, share and change. Using them together is the ultimate goal of a great program.

PHP Skip Lists 1: The Quickest Route Home

skiptownA few years ago I created a post on Skip Lists (different blog), and several people told me that it clarified the concept of skip lists for them; so I’m reproducing it here for PHP developers. Using the analogy of rail travel in the New York-Hartford-Boston area, I show how the skip list can work in the world apart from programming. Living in Bloomfield, Connecticut, I’m about halfway between Boston and New York City. Lately they’ve been talking about building a high speed rail to Hartford and on to Springfield, Massachusetts. Naturally, when thinking about such a rail system, I like to think that the really fast part of the trip would be between New York City, Hartford, and Boston. Of course they’d need an express to stop off in New Haven and Springfield. Further, lots of other towns, including Bloomfield, would need some kind of local line. Left to my own devices, I would build the following lines:

Bullet

  1. New York
  2. Hartford
  3. Boston

Express

  1. New York
  2. New Haven
  3. Hartford
  4. Springfield
  5. Boston

Local

  1. New York
  2. Stamford
  3. New Haven
  4. Middletown
  5. Hartford
  6. Bloomfield
  7. Springfield
  8. Boston

Given these choices, I know that I can easily get to Bloomfield by taking the Local at Grand Central Terminal in New York City. However, I’d really like to skip stops in Stamford, New Haven and Middletown. If I took the express, I’d have a stop in New Haven and then on to Hartford where I could get on the Local to Bloomfield. That would be faster, but it would not be the fastest route.

To determine the fastest route, assuming that I have to go in the same direction and cannot backtrack, I diagramed the three lines in Figure 1.

Figure 1: Three train lines between NYC and Boston

Figure 1: Three train lines between NYC and Boston

Looking at Figure 1, it is not too difficult to determine the quickest route between NYC and Bloomfield.

The route that skips the most stops is the quickest route.

Trace your finger from NYC to Bloomfield in Figure 1 and you can determine that route. To see if you found it, look at Figure 2 to see the fastest route highlighted in yellow.

Figure 2: The Route that Skips the Most Stops

Figure 2: The Route that Skips the Most Stops


If you traced the route highlighted in yellow in Figure 2; then you understand how skip lists work. In a similar way, the Binary Search discussed here does not start at the beginning and iterate through an entire ordered list. Rather it splits the list determining the quickest way to the search item.

From Train Lines to Linked Lists

To go to the next step, you need to understand Linked Lists which we discussed in a recent post. What I’d like to do is to transform our train lines into a linked list. The first step is to change the different towns on the lines to nodes in a linked list. Figure 3 shows the first step in this transformation:

Figure 3: From Train Stops to Nodes in a Linked List

Figure 3: From Train Stops to Nodes in a Linked List

My hunch is that if I asked you what was the quickest route to find the value “30” you’d know just what to do. However, the diagram still looks like a train dispatcher’s chart, and so we’ll update it a bit to look more like a linked list. Figure 4 shows this new diagram.

Figure 4: Linked list with multiple pointers<

Figure 4: Linked list with multiple pointers<[/caption] Figure 5: Node with multiple pointers Figure 5: Node with multiple pointers

The arrows now have balls on their tails and look like pointers—and we know that pointers hold a reference to the next node. If we have multiple pointers with some of our nodes, we can have a single linked list that includes pointers that skip certain nodes. Figure 5 shows how that might look in the first node:

Keeping in mind that the linked lists are ordered (sorted) our search for the value “30” would look something like this:

  • Starting point would be on the shortest linked list: 30 is > 25 but < 53.
  • Drop to next level down to the second shortest list: 30 is < 43.
  • Drop to next level down to the third shortest list: 30 == 30. Your search is over.

Now we’re ready to deal with coding skip lists in PHP5. We’ll have to work out the algorithms, and in the next post on skip lists we should be able to have a working skip list that we can use with an iterator pattern.

PHP Iterator Design Pattern II: Binary Search

fastBinaryList Searches

To move ahead with implementations of the Iterator design pattern that help illustrate the value of polymorphic iterations, understanding how different list searches work is essential. In their Iterator illustration, Gamma et al provide an example of a program that has concrete iterators for both standard lists and for skip lists. Skip lists are made up of linked lists and use a binary search algorithm to quickly iterate through the skip list. So to understand skip lists you need to first understand both linked lists and binary searches. This post is dedicated to the latter—binary searches. A future post examines linked lists with skip lists.

Lists come in two basic flavors: ordered and unordered. Searches depend on finding a specific item in a list, and most algorithms are designed for ordered lists. For PHP developers, this can mean anything from searching for an item in a list from a database to an associative array. To get started, consider an associative array and finding a key using the built-in array_search($item, $list) method. The array_search(($item, $list) iterates through an array until it finds the element specified in $item in the array, $list and returns the key. To see a simple example, click the Play button and to download the file, click the Download button.QuizbinaryBtnDownload
All examples were tested on single core (e.g., Raspberry Pi) and multi-core (e.g., Dell, Macintosh) systems.

Using the array_search() method, you can use the key in the key-value pairs as information resources because the search returns the key instead of the value. You can see that in the following class:

< ?php
ERROR_REPORTING( E_ALL | E_STRICT );
ini_set("display_errors", 1);
//ArraySearch.php
class ArraySearch
{
    private $country;
    public function __construct($nation)
    {
         $this->country=$nation;
         //Generate list
         $currency=array('rupiah' => 'Indonesia','peso' => 'Uruguay', 'real' => 'Brazil', 'rupee' => 'India',
                         'dollar' => 'Liberia', 'euro' => 'Luxembourg', 'yen' => 'Japan', 'pound' => 'Egypt',
                         'yuan'  => 'China', 'dinar' => 'Serbia', 'shekel' => 'Israel', 'krone' => 'Norway',
                         'ringgit'  => 'Malasia', 'rial' => 'Iran', 'shilling' => 'Kenya', 'rand' => 'South Africa',
                         'ruble' => 'Belarus','dong' => 'Vietnam');
         echo "$this->country's currency is the " . array_search($this->country,$currency) . ". 

"
; } } $tour=$_POST['tour']; $worker = new ArraySearch($tour); ?>

For relatively short lists, the array_search() method works fine, and you don’t have to write (or remember!) a lot of code to work with it.

The Binary Search

binarySearchTo get a sense of how the binary search works, take a piece of paper (a discarded joke-a-day calendar page works well) and fold it into two equal halves. Keep folding it in half until you have 6 folds. After six folds I could not get a seventh fold—the paper was too fat by that point.

Instead of doubling something, if you halve a value, even very big numbers quickly are reduced to zero. For example, how many times do you think that it would take to half the number 1 million (1,000,000), rounding all halves down, before it became zero? Would it take 1000 halves? 10,000? 100,000? The following class loops through an operation that cuts the value in half with each iteration. It starts with a value of 1 million.

< ?php
//BinaryCut.php
class BinaryCut
{
    public function __construct()
    {
        $bignum=1000000;//1 million
        $count = 0;
        while($bignum > 0)
        {
            $bignum= floor($bignum / 2);
            ++$count;
            echo "iteration# $count : Number=$bignum 
"
; } } } $worker=new BinaryCut(); ?>

When you run the program, you can see how quickly 1 million is reduced to zero. That same principle is used in a binary search.
Continue reading ‘PHP Iterator Design Pattern II: Binary Search’