Chapter 1:
Introduction

Download the code

Hi there! Today, we're going to talk about searching. No, not searching for ourselves—this isn't a philosophy course, after all. Instead, we're going to discuss searching for data. As mentioned in the last lesson, every good content management system provides an interface to help visitors search for information. Forcing visitors to manually wade through your Web site to find something is not only rude, but it can needlessly increase your site's traffic.

We'll use a few tricks to implement the search function in our Recipe Center. First, we'll look at how to use the SELECT SQL statement to efficiently search through all our data records to find specific items that visitors request. Next, we'll examine how some of the biggest players on the Web implement search engines in their sites and see if we can borrow a few of their ideas.

Finally, we'll look at the best way to provide a search facility in our dynamic Web pages and how to display the search results so visitors can best utilize the information.

Let's head to Chapter 2 and start our searching.

Chapter 2:
Searching the Database

For the next couple of chapters, we need to talk a little theory. But I promise not to make it boring. We'll stick to the essentials so that when you get to actually start writing code in Chapter 4, you'll know what you're doing. Trust me; it will be worth the wait.

Earlier in the course, we discussed how to use the SQL SELECT statement to query data contained in our database. We saw how to use the WHERE clause to help us filter out only the specific records we're interested in. Now we'll expand on that and learn how to do more advanced searching through our data.

So far in this course, when we've used a WHERE clause in our SQL programming, we've compared the data field against a literal data value. Using this technique, you get all the records that match that exact literal value. Thus, the SELECT statement:

SELECT * FROM comments WHERE recipeid = 3

returns only the data records that contain exactly the value 3 in the recipeid data field. This type of searching is called an exact match. We specify the exact data that the data field contains.

You can also perform exact matches on string values, for example:

SELECT * FROM comments WHERE poster = 'rich'

This query returns the data records that contain the string value rich in the poster data field. This kind of searching is easy to implement for short string data fields such as the poster data field, but unfortunately, it's not too useful when searching for data within a long text data field.

In most cases, visitors will want to search an entire text string for a specific substring value. For example, they might want to search the recipes table for all recipes that contain the word chicken in the title data field.

Obviously, we don't want to just return data records where the title only says chicken. Instead, we'd like to return data records with titles such as chicken dumplings, roasted chicken, or spicy chicken wings. This requires something more than just an exact match string search.

In each of these cases, the search word chicken appears in different places in the text string. So somehow we need to be able to search within the string value to look for a specific word.

This type of searching is called an inexact match. In an inexact match, you're searching for a string within a longer text string. Since this is a common function in the database world, the SQL language provides a solution for us.

The LIKE Condition

The LIKE condition is part of the WHERE clause in the SELECT statement. It allows us to add some ambiguity to our queries. It enables us to specify a pattern that the MySQL server matches against the data field values you specify in the WHERE clause. The format of a SELECT statement with a LIKE condition is:

SELECT datafieldlist FROM tablelist WHERE datafield LIKE 'pattern'

The pattern used in the LIKE condition specifies what string pattern the query should search for. You must enclose the pattern in either single or double quotes to denote the start and end of the pattern string.

The pattern value can consist of normal literal string characters, along with two types of wildcard characters:

You should use an underscore wildcard character when you want to search for words that differ by a single character. For example, the SQL query:

SELECT * from commentsWHERE poster LIKE '_ich'

will return data records that contain the values rich or Rich (or any other four-character word ending in ich). You can use the underscore character anywhere in the pattern and as many times as you want; for example:

SELECT * from commentsWHERE poster LIKE '_harl__'

This code will return data records that contain the values Charles, charles, Charlie, or charlie. You can now start to see how versatile this feature is.

The downside to the underscore wildcard character is that you can only specify one character at a time. If you need to match a pattern of unknown characters, you'll need to use the percent wildcard character.

You can use the percent wildcard character to replace zero or more characters in a pattern. For example, to retrieve all data records that have a poster value starting with r, you can use the statement:

SELECT * from commentsWHERE poster LIKE 'r%'

This pattern would match any poster value starting with r, no matter how many characters were in the string, such as rich, richard, rob, or robert.

Remember, the percent wildcard character can also match zero characters, so the query:

SELECT * FROM commentsWHERE poster LIKE 'rich%'

will return data records with the poster value rich as well as richard.

importantImportant: By default, MySQL uses case insensitive searches in the LIKE patterns. Thus, the pattern rich will match values of both Rich and rich.

You can also use multiple percent symbols within a pattern to match zero or more characters in a string. Going back to our original example of looking for all recipes that have the word chicken in the title, we would use the query:

SELECT * FROM recipes WHERE title LIKE '%chicken%'

This pattern would match the data values Chicken dumplings, roasted chicken, and of course, spicy chicken wings.

Returning the Opposite

Finally, one last feature you should be familiar with is the NOT condition. You can add this to the LIKE condition to return the opposite of your search pattern. For example, the query:

SELECT * FROM comments WHERE poster NOT LIKE 'rich'

will return data records where the poster value is equal to anything but the value rich. We won't be using this feature in our application, but it's there if you ever need it.

Now let's move on to Chapter 3 and see if we can learn some tricks from how the large Web sites implement searching.

Chapter 3:
Developing a Search Strategy

Now that you've got a handle on how to write SQL queries to search for data, we need to decide how we're going to implement searching in the Recipe Center application. There are plenty of things to search for in the Recipe Center data. The trick is in knowing what to search for and how to search for it.

Commercial search companies such as Google and Yahoo spend millions of dollars developing custom search algorithms. We won't spend nearly as much money (or time) coming up with our search strategy, but there are a few things we can learn from those guys.

Just about everyone who uses the Internet is accustomed to how searching works in Google and Yahoo. In a way, this is unfortunate for us because it means we must be careful not to implement a search function that's so different from those sites that it will confuse our visitors.

Here's an example of what I mean. What happens when visitors enter a multiword search, such as chicken casserole? In the Google or Yahoo worlds, such a search returns Web sites that contain the words chicken and casserole anywhere on the Web page. But how would we handle this situation?

Using our newfound friend the LIKE condition, we could create the query:

SELECT * from recipes WHERE title LIKE '%chicken casserole%

This will work fine for recipes such as Chicken Casserole or Chicken Casserole Pot Pie. But what if we also have a recipe called Chicken and Cheese Casserole? The query will fail to retrieve this data record because in the query there is text between the words chicken and casserole.

To solve this problem, we could change the LIKE condition to be '%chicken%casserole%', using another wildcard character between the two search words. This would solve some problems, but it still wouldn't work quite right. It would fail to retrieve the data record Cheese casserole with chicken (because the pattern is looking for the word chicken to come before the word casserole). Are you starting to see the dilemma those fancy search companies have created for the rest of us?

The Google and Yahoo search engines split multiword phrases apart and then search for each word in the phrase anywhere in the content. Our job is to try to mimic this functionality as cleanly as possible (and without the fancy million-dollar search algorithms).

The AND Condition

To be able to search for words anywhere in a data field, we'll need to use the Boolean AND condition in our SELECT statement to link together multiple LIKE conditions. That sounds like a mouthful, but it's really fairly simple:

SELECT * from recipes WHERE title LIKE '%chicken%' AND title LIKE '%casserole%'

The AND condition requires both LIKE conditions to be met in order for the query to return the data record. The problem with this SQL statement is that it is extremely resource intensive. It requires the MySQL server to search through the entire data field twice looking for the patterns. Each additional search word the requestor supplies requires yet another full scan of the data field. This is very inefficient.

To solve this problem, the big search engines create index tables of search words (also called keywords). The index table references each location a keyword is found. The index tables even account for common misspellings so the search engine can correct your typos as you search.

We won't get that fancy in implementing our search engine (although feel free to experiment on your own if this sort of thing interests you). Instead, to try to limit our search times, we'll limit our searches to words found in the recipe titles. Since this is a relatively small data field, searching for two or three keywords shouldn't have much impact on our application's performance.

Now that we know how to perform a SELECT query to search for words anywhere in our data, we need to be able to build the query. The tricky part is splitting the search string into separate words and plugging each word into the SELECT statement. Fortunately, the wonderful people who created PHP have some easy solutions for us.

PHP String Handling Functions

The explode() PHP function takes a string and splits it into an array based on a separation string (called a delimiter). The format of this function is:

array explode(string $delimiter, string $string)

The phrase stored in the $string variable is split at every $delimiter string. The function stores the individual words in an array variable. For example, the code:

$words = explode(" ", $phrase)

splits each word separated by a space in the $phrase string into an array element stored in the $words array variable. You can then reference the individual words in the original string using the numerical array elements (such as $words [0] for the first word, $words [1] for the second, and so on).

This solves the problem of breaking out the individual search words. Next, we need to create a SELECT statement using each of the words we extracted from the search string. We'll use another PHP function to help us with that.

The opposite of the explode() function is the implode() function. This nifty little function takes elements from an array variable and puts them back into a single string variable. The format of the implode() function is:

string implode(string $glue, array $pieces)

The implode() function uses the string $glue to separate the individual array elements contained in the array $pieces. The resulting string will look like this:

$pieces [0] $glue $pieces [1] $glue $pieces [2] . . .

You can use the $glue feature to insert any type of string between the data elements. So now we can build our SELECT WHERE clause:

$phrase = implode("%' AND title LIKE '%", $words)

The code uses the string %' AND title LIKE %' as the glue string. This ensures that each array element is followed by a wildcard character, the AND title LIKE phrase, and the next array element is preceded by a wildcard character. What this does is create the string:

chicken%' AND title LIKE '%casserole

Now all we need to do is add the beginning and ending wildcard characters and the rest of the SELECT statement:

$query = "SELECT * from recipes WHERE title LIKE '%$phrase%'"

Now we've got the beginning quote and percent, the search words split into separate LIKE conditions, and an ending quote. That's exactly what we were looking for.

But what if the visitor only searches on a single word? Fortunately for us, the implode() function doesn't add the $glue string if only one element is present in the array. (Those PHP programmers are pretty smart!) Now we've got all of our bases covered.

Okay, if I've put you to sleep with all this theory, you can wake up now. We're ready to move on to Chapter 4 and start writing some code.

Chapter 4:
Adding Searching
to the Recipe Center

Let's put our newfound knowledge of searching to use within the Recipe Center application.

First we need to decide how to offer the search feature to our Web page visitors. You have a few different options available, such as creating a separate search page or incorporating a search form within your home page.

Including it on the home page is starting to become standard for larger Web sites. This is usually a small text box set aside somewhere on the page, along with a submit button next to it. This allows visitors to quickly and easily find items within the site.

We'll use this strategy for the Recipe Center. If you remember from earlier in the course, the navigation cell within our home Web page table includes a simple form for performing searches. Here's a reminder of how this looks:

The Recipe Center's home page.

The Recipe Center's home page

The nav.inc.php file produces the content in the navigation cell. Let's revisit that code:

Print code

<table width="100%" cellpadding="3">
  <tr>
    <td><h3>Welcome</h3></td>
  </tr>
  <tr>
    <td><a href="index.php"><strong>Home</strong></a></td>
  </tr>
  <tr>
    <td><a href="index.php?content=login"><strong>Login to post</strong></a></td>
  </tr>
  <tr>
    <td><a href="index.php?content=register"><strong>Register for free login</strong></a></td>
  </tr>
  <tr>
    <td><hr size="1" noshade="noshade" /></td>
  </tr>
  <tr>
    <td bgcolor="#FFFF99"><a href="index.php?content=newrecipe"><strong>Post a new recipe</strong></a></td>
  </tr>
  <tr>
    <td> </td>
  </tr>
  <tr>
    <td>
<form action="index.php" method="get">
    <label><font color="#663300" size="-1">search for recipe:</font> </label>
      <input name="searchFor" type="text" size="14" />
      <input name="goButton" type="submit" value="find" />
      <input name="content" type="hidden" value="search" />
  </form>  </td>
  </tr>
</table>

The form in the navigation cell consists of a simple text box, which supplies:

When a visitor enters a value in the text box and clicks the find button, the form calls the index.php file using the search.inc.php file for the main cell content.

Creating the Search Results Page

The search.inc.php include file does all the work of searching for the keyword or keywords within the recipe titles and displaying the results in the main cell area. Let's build the search.inc.php file:

  1. Create a new text file named search.inc.php in the recipe folder of the WAMP5 www area
  2. Enter the following code into the file:

    Print code

    <?php

    $con = mysql_connect("localhost", "test", "test") or die('Sorry, could not connect to server'); mysql_select_db("recipe", $con) or die('Sorry, could not connect to database');

    $search = $_GET['searchFor']; $words = explode(" ", $search); $phrase = implode("%' AND title LIKE '%", $words); $query = "SELECT recipeid,title,shortdesc from recipes where title like '%$phrase%'"; $result = mysql_query($query) or die('Could not query database at this time');

    echo "<h1>Search Results</h1><br><br>\n"; if (mysql_num_rows($result) == 0) { echo "<h2>Sorry, no recipes were found with '$search' in them.</h2>"; } else { while($row=mysql_fetch_array($result, MYSQL_ASSOC)) { $recipeid = $row['recipeid']; $title = $row['title']; $shortdesc = $row['shortdesc']; echo "<a href=\"index.php?content=showrecipe&id=$recipeid\">$title</a><br>\n"; echo "$shortdesc<br><br>\n"; } } ?>

The search.inc.php code first connects to the local MySQL server and selects the recipes database. It then retrieves the search phrase the visitor entered into the search textbox from the HTML variable and assigns it to the $search PHP variable.

From there, you should recognize our new search code to create the SELECT query statement. The code explodes the search phrase and then implodes it using the LIKE condition text.

Once the code creates the SELECT statement, it sends it off to the MySQL server and retrieves the results. If any data records are returned, they are processed within the while() loop.

Inside the while() loop, the program first assigns the retrieved data fields to PHP variables, and then it uses those variables to create a simple link for each recipe:

echo "<a href=\"index.php?content=showrecipe&id=$recipeid\">$title</a><br>\n";
echo "$shortdesc<br><br>\n";

The first echo statement creates a link passing the showrecipe include file name and the recipeid data value. The link uses the title of the recipe as the text. When a Web page visitor clicks the link, the code calls the index.php program using the showrecipe include file in the main cell area, which displays the recipe text.

Following the recipe link, the code displays a short description of the recipe. The finished search results Web page looks like this:

The search results Web page

The search results Web page

That's starting to look pretty professional! And we did it all with just some simple PHP coding.

Chapter 5:
Summary

Today, we discussed searching. Searching requires that you perform an inexact match query on string data fields. You can use the SQL LIKE condition, along with wildcard characters, to perform an inexact match query. The SQL LIKE condition allows you to search long text fields looking for single keywords or phrases.

We also covered how to implement a search engine strategy in your Web applications. We touched on how Internet users expect search engines to work like Google and Yahoo. While we can't duplicate Google's or Yahoo's exact functionality, we can create SQL statements to produce similar results. We saw how to break a search phrase down into the individual word elements and then search a data field for those elements.

Finally, we walked through the PHP code required to implement our search strategy in the Recipe Center application. We used the PHP explode() and implode() functions to help build our SQL query statements from the search phrase words our visitors provide.

Things are looking pretty good for our Recipe Center. We've incorporated lots of functionality that commercial sites use, giving our Web site a professional look and feel. In the next lesson, we'll examine how to implement printing in our application. This is another area where visitors expect a certain behavior based on how commercial Web sites operate. In Lesson 8, we'll make sure we don't disappoint them.


Supplementary Material



The Anatomy of a Search Engine
http://infolab.stanford.edu/~backrub/google.html
If you're into math, check out this paper describing some of the math behind how Google produces Web page rankings in its search engine.

Search Engine Roundtable
http://www.seroundtable.com/
This is a forum that follows new developments in the search engine world.

FAQs


Q: Is there a limit to how many keywords you can add to the SQL SELECT statement using the AND LIKE condition?

A: Yes, there is a limit to the total length of an SQL statement in MySQL. By default, the total size of an SQL statement is limited to 64 KB. Since each character takes one byte, this gives you over 64,000 characters in your SQL statement! I'm not sure I want to see a 64,000 character SQL SELECT statement, but you can do it if you need to.


Q: Can I implement a more advanced algorithm for searching in the Recipe Center application?

A: Of course, you can modify the application to your heart's content. If you come up with a more imaginative algorithm, please share it in the Discussion Area for the rest of us to play with (unless, of course, you plan on patenting your creation).


Assignment


Learning how to write good search SQL code requires practice and experience. Today's assignment will give you some.

Start the phpMyAdmin tool, select the recipes database, and try writing various search SQL SELECT statements using the LIKE condition patterns. Try writing your SELECT statement to search for the search keywords in other data fields, such as the shortdesc, ingredients, or even the directions data fields.

Does which data field you choose affect how long the queries take? (You might have to enter a lot of data records before you notice much of a difference.)