The Seven Percent Solution
Rod Stephens
In this months column Rod explains how you can make more than $5 million. Of course you need to have $100 million to spend and just the right combination of investment opportunities. Those are problems that you have to solve on your own. But Rod gives you a decision tree to help.
SUPPOSE you have $100 million to spend on one of two investments. The first costs $75 million and promises a seven percent return. The second costs $80 million and yields a five percent return. Seven percent of $75 million is $5.25 million and five percent of $80 million is $4 million. To make the most money, you should pick the seven percent solution.
Unfortunately, reality is rarely so simple. If you really had $100 million to spend, you would suddenly find you had dozensif not hundredsof new best friends, all willing to share the investment "deal of the century" with you. Obviously, even if you have a pretty good method for estimating the potential return from an investment, theres always a chance something will go wrong. In a complex mix of uncertain investments, picking the combination that will maximize your return is a much more difficult problem.
You can model this and many other difficult problems using decision trees. This month Ill explain what decision trees are and how you can use them to study these sorts of problems. Ill also describe two methods for searching a decision tree to find the best possible solution. Ill finish by describing a couple of heuristic methods for finding good solutions using decision trees. (Heuristic (adj. or noun): Of or relating to exploratory problem-solving techniques that utilize self-educating techniques. Related to "common sense," or "rule of thumb.") While heuristics arent guaranteed to produce the best solution, they often produce very good results for search trees that are too big for other methods.
The root of the problem
The investment problem I described earlier is actually a specific example of one of the classic problems in computer science: the knapsack problem. In this problem you have a "knapsack" of a certain size. You also have several objects of different sizes and values. Your goal is to fit objects into the knapsack to make the total value of the package as large as possible.
In the investment problem, the "knapsack" represents the amount$100 millionthat you have to spend, and the objects represent the potential investments. Their sizes are their costs and their values are their profits. When you pick your investments, their combined cost must be no greater than the $100 million knapsack size. You want to maximize the profit returned by your investments while satisfying this constraint.
One way to think of the knapsack problem is to askfor each objectwhether that object should be part of the solution. If you make these decisions one at a time, you can model them using a decision tree. Each node in the decision tree represents one of these questions. For example, the root node represents the question, "Should I spend money on the first investment?"
Each branch out of a node represents one of the possible answers. For consistency, you can assume the left branch represents a "yes" answer to a question and the right branch represents a "no" answer.
Figure 1 shows a decision tree for the two-investment problem. Notice that the leftmost path through the tree isnt feasible because it costs more than the $100 million spending allowance. Notice also that the rightmost path through the tree corresponds to making no investments, costs nothing, and produces no profit. In any knapsack decision tree, the rightmost path will be feasible since it costs nothing.
Climbing trees
Once you start thinking in terms of decision trees, you can convert the original problem into one of finding a path through the tree. The paths from the top to the bottom of the tree represent possible solutions to the knapsack problem. The branches you follow in a path indicate the decisions you should make to produce a particular solution. For instance, if you follow the "yes" branch at the second level in the tree, you should spend money on the second investment opportunity.
Your mission is to find the "best" path through the tree. In the knapsack problem you want to find the path to the leaf node with the greatest profit. In Figure 1, the second node from the left at the bottom of the tree has the largest profit, so the path to that node is the best path through the tree.
One way you could search for the best path is to examine them all. When you reach the bottom of the tree, first make sure the resulting solution is allowed. For example, in the investment example, the total cost of the investments in the solution must not exceed $100 million. Next, see if the result produced by the path is better than the best result youve found so far. If so, store it somewhere and continue checking the rest of the tree. When youre finished, youll have found the best possible combination of investments.
Since this method searches the entire tree, it is called an exhaustive search. As youll see when you run the example programs, this kind of search can indeed be exhausting. Listing 1 shows code that finds the best possible solution using an exhaustive search. The Cost and Profit arrays hold the costs and profits provided by the different investments.
The Exhaust subroutine takes as a parameter the index of an item. It recursively explores the solutions possible with that item either included or excluded from the test solution. For example, when the items index is three, the routine is considering a decision node at the third level of the decision tree. It first adds the item to the test solution and then recursively tests assignments for the remaining items. When that recursion returns, it excludes the item from the test solution and again recursively makes assignments for the other items.
When subroutine Exhaust is invoked for an item with an index greater than the total number of items available, it has already made assignments for all of the investment options. It then checks to see if the cost of the test solution fits within the spending allowance MAX_COST. If so, and if the profit provided by the test solution is greater than the best value found so far, the routine saves the test solution as the new best solution.
Listing 1. Solving the knapsack problem with exhaustive search.
' Allowed total cost.
Const MAX_COST = 100
' The number of items.
Dim NumItems As Integer
' The cost for item i.
Dim Cost() As Integer
' The profit for item i.
Dim Profit() As Integer
' Items in the test solution.
Dim TestUse() As Integer
' Profit in the test solution.
Dim TestProfit As Integer
' Cost in the test solution.
Dim TestCost As Integer
' Best profit so far.
Dim BestProfit As Integer
' Items in the best solution.
Dim BestUse() As Integer
Sub Exhaust (item As Integer)
Dim i As Integer
' If all the items are either in or out,
' see if this solution is better than the
' current best.
If item >= NumItems Then
If TestCost <= MAX_COST And TestProfit > _
BestProfit Then
' Save this solution.
For i = 0 To NumItems - 1
BestUse(i) = TestUse(i)
Next i
BestProfit = TestProfit
End If
Exit Sub
End If
' ***********************************
' Try adding the item to the solution.
' ***********************************
' Add the item to the solution.
TestUse(item) = True
TestCost = TestCost + Cost(item)
TestProfit = TestProfit + Profit(item)
' Recursively examine the possibilities.
Exhaust item + 1
' Remove the item from the solution.
TestUse(item) = False
TestCost = TestCost - Cost(item)
TestProfit = TestProfit - Profit(item)
' ****************************************
' Try keeping the item out of the solution.
' ****************************************
' Recursively examine the possibilities.
Exhaust item + 1
End Sub
The EXHAUST program on the Developers Disk demonstrates this algorithm. Enter the number of items you want the program to work with, press the Randomize button, and the program will create that number of random costs and profits. If you prefer, you can enter specific values yourself by typing them into the text boxes.
When the values are ready, press the Solve button and the program will perform an exhaustive search of the decision tree. The program will indicate which items belong in the best solution and the best solutions cost and profit. The program will also tell you the elapsed time and the number of nodes in the decision tree. Be sure you test the program with a small number of items until you know how long it will take on your computer.
From tiny acorns
From a tiny acorn grows the mighty oak. From only a few questions grows a huge decision tree. For the knapsack problem each question is answered by a "yes" or "no." That means each internal node in the decision tree has two branches. If there are N objects to consider, the tree will be N + 1 levels tall, so it will contain 2N+1 - 1 or roughly 2N+1 nodes. This is a lot of nodes. A decision tree representing 30 investments would contain more than two billion nodes. It will take even a fast computer a long time to search such a large tree exhaustively.
It takes my 90 MHz Pentium about one second to search the tree corresponding to a 15-item problem. It takes 33 seconds to search the tree for a 20-item problem. Its no coincidence that this is roughly 32 times longer. The tree for a 20-item problem contains five more items than the smaller problem so it will contain about 25 = 32 times as many nodes. Searching all of these nodes should take about 32 times as long.
Because decision trees grow exponentially, you can use exhaustive search to solve only the smallest problems. Given that my computer takes about one second to solve a 15-item knapsack problem, a 40-item problem would take 225 times as long or about a year. A 75-item problem would take more time to solve than the current age of the universe.
Pruning with a chainsaw
You can make decision tree searching much faster if you can prune some of the branches. For example, suppose you had $100 million to spend on any of 40 investments that each cost between $60 and $90 million. The complete decision tree for this problem contains 241 nodes or a bit more than 2 trillion nodes. [Imagine Bill Gates dilemma.Ed.]
On the other hand, since the investments are so expensive, you know that only one at a time will fit within the $100 million you have to spend. At the first node in the tree, you decide whether the first investment option is part of the test solution. If it is, you can stop searching since you know you dont have enough money to add another investment to the solution.
Similarly, whenever you add any investment to the solution, you can stop searching the tree. In this example, you really need to follow only 40 different paths from the root of the tree to the leavesone corresponding to including each investment in the solution. You can skip the other trillion or so possible paths.
The branch and bound strategy (see Listing 2) extends this idea to use upper and lower bounds on the test solution to prune the decision tree. When you ask the question, "Should this item be part of the test solution?" you first check to see if you have enough room left in your spending allowance. If the cost of the current test solution plus the cost of the new item is greater than the allowance, you dont need to recursively consider solutions that contain the item. In the previous example where every item is fairly expensive, this pruning strategy works well. After you have added one item to the test solution, the algorithm will prune every other branch from the tree that includes another item.
Another test you can use to prune the tree is to consider how much profit a branch could possibly bring. As you work through the tree, keep a running total of the amount of profit possible with items that you havent considered. If the current profit of the test solution plus this total unused profit is no larger than the profit of the current best solution, theres no point in considering the test solution further. Even if you added every other item to the test solution, it wouldnt be an improvement over the best solution you have found so far.
If you use these tests to trim the tree, then every time the routine reaches the bottom of the tree it has found an improved solution. If the solution isnt better, it will be trimmed in the previous step. At that step the profit given by the current test solution plus the maximum possible profit remaining must be greater than the profit of the current best solution. Otherwise the algorithm wont continueconsidering the solution. Similarly, the solution must fit within the spending allowance (our constraint), or youd have trimmed that branch off in the previous step.
Thus every time you reach the bottom of the tree, youve found a feasible solution thats better than any youve seen before. You can update the best solution whenever you reach the bottom of the tree without performing any other tests.
Listing 2. Branch and bound.
Option Explicit
' Allowed total cost.
Const MAX_COST = 100
' The number of items.
Dim NumItems As Integer
' The cost for item i.
Dim Cost() As Integer
' The profit for item i.
Dim Profit() As Integer
' Total profit of unused items.
Dim UnusedProfit As Integer
' Items in the test solution.
Dim TestUse() As Integer
' Profit in the test solution.
Dim TestProfit As Integer
' Cost in the test solution.
Dim TestCost As Integer
' Best profit so far.
Dim BestProfit As Integer
' Items in the best solution.
Dim BestUse() As Integer
Sub BranchAndBound (item As Integer)
Dim i As Integer
Visited = Visited + 1
' If we reached a leaf node, we know this
' solution is an improvement so save it.
If item >= NumItems Then
' Save this solution.
For i = 0 To NumItems - 1
BestUse(i) = TestUse(i)
Next i
BestProfit = TestProfit
Exit Sub
End If
' ***********************************
' Try adding the item to the solution.
' ***********************************
If TestCost + Cost(item) <= MAX_COST And _
TestProfit + UnusedProfit > BestProfit Then
' Add the item to the solution.
TestUse(item) = True
TestCost = TestCost + Cost(item)
TestProfit = TestProfit + Profit(item)
UnusedProfit = UnusedProfit - Profit(item)
' Recursively examine the possibilities.
BranchAndBound item + 1
' Remove the item from the solution.
TestUse(item) = False
TestCost = TestCost - Cost(item)
TestProfit = TestProfit - Profit(item)
UnusedProfit = UnusedProfit + Profit(item)
End If
' ****************************************
' Try keeping the item out of the solution.
' ****************************************
' Recursively examine the possibilities.
If TestProfit + UnusedProfit - Profit(item) _
> BestProfit Then
UnusedProfit = UnusedProfit - Profit(item)
BranchAndBound item + 1
UnusedProfit = UnusedProfit + Profit(item)
End If
End Sub
Program BRANCH on the Developers Disk uses both exhaustive search and branch and bound to solve the knapsack problem. Before you randomly create data, you can fill in the minimum and maximum values you want to allow for the costs and profits. If the items have large costs, branch and bound will be able to prune much of the tree and the program will examine a relatively small number of nodes. If the items have small costs, many items will be able to fit within a feasible solution, so the program will need to consider many more nodes and it will take longer. Try creating 20 random items with costs between 9 and 11. Since the items costs are similar and since many items will fit within the $100 million spending allowance, the program will need to consider many possible investment combinations.
Once the program has found a solution, it displays the items it selected, the cost and profit of the solution, and the time the program used. It also displays the number of nodes in the complete decision tree as well as the number of nodes it actually visited. Youll notice that branch and bound generally visits far fewer nodes than exhaustive search. For a 20-item knapsack problem, branch and bound might examine a few thousand or even just a few hundred nodes depending on the items costs and profits. An exhaustive search would examine all of the decision trees two million-plus nodesunless you powered down or rebooted in frustration, that is.
Close is better than nothing
If you add enough nodes to a decision tree, you can easily make a tree so large that even branch and bound wont be able to search it. For medium-sized problems, you could just let the program run for a month or two until it found a solution. Unfortunately, you wouldnt be able to use your computer for more important things like Solitaire and CompuServes chat rooms. For really large problems, you may as well not even bother trying.
To solve these really big problems, you can turn to a heuristic (pronounced yoo-RIS-tik). A heuristic is an algorithm that will probably give a pretty good solution, but isnt guaranteed to find the absolutely best solution possible. System expert Dave Prerau uses as a heuristic for not getting speeding tickets: if you drive between 5 and 10 miles per hour above the speed limit, you probably wont get a ticket, but theres obviously no guarantee. [Notice how circumspect Rod is on this issue, quoting an "expert" rather than offering his own heuristic for avoiding speeding tickets.Ed.]
For any given problem, you can come up with lots of different heuristics. Some will work better than others depending on the circumstances. One common type of heuristic is called a hill climbing algorithm (see Listing 3). At each step, a hill climbing heuristic attempts to move the solution as far as possible towards the goal of the problem. This kind of algorithm is called "hill climbing" because its similar to a lost hiker trying to find the top of a mountain in a fog or in the dark. While the hiker cant see where the top of the mountain is, he can try to get there by always moving uphill. Of course, a hiker in the dark may get stuck on the top of a small hill and not reach the highest point on the mountain. Hill climbing heuristics suffer from this same problem. The algorithm can get stuck at a local maximum and be unable to find the global maximum.
In the knapsack problem, the goal is to find the largest profit possible. At each step, a hill climbing algorithm would add to the test solution the investment that provided the largest possible profit while fitting within the $100 million spending allowance. That moves the solution closer to the goal of finding a large profit.
If you begin by sorting the investments according to their profits, you can just run through the sorted list in order adding the next item that fits within the allowance. Even if you dont sort the list, you can search through the entire list each time you need to find the next item very quickly. If there are N investments, you may need to examine the list up to N times. At most youll examine N * N = N2 items. This is far fewer operations than the 2N+1 required by an exhaustive search. For N = 40, hill climbing would require at most 1,600 operations while exhaustive search would examine all two trillion nodes in the decision tree.
Listing 3. Hill climbing.
Sub HillClimbing ()
Dim unspent As Integer
Dim best_i As Integer
Dim best_profit As Integer
Dim i As Integer
unspent = MAX_COST
Do
' Find the next item with the largest profit
' that will fit within the solution.
best_profit = -1
best_i = -1
For i = 0 To NumItems - 1
If Not BestUse(i) And Cost(i) <= _
unspent And Profit(i) > best_profit _
Then
best_i = i
best_profit = Profit(i)
End If
Next i
' If no more items will fit, we're done.
If best_i < 0 Then Exit Do
' Add the item to the solution.
BestUse(best_i) = True
BestProfit = BestProfit + best_profit
unspent = unspent - Cost(best_i)
Loop
End Sub
Least cost
Another common heuristic strategy is the least cost approach (see Listing 4). At each step instead of adding the item that moves the solution as close to the goal as possible, this strategy adds the item that has the smallest cost. This allows the algorithm to fit as many items as possible into the solution. If the items profits are roughly equal, the least cost strategy will produce a good solution.
The structure of a least cost heuristic is similar to that of a hill climbing algorithm. You can get the best performance by initially sorting the items according to their costs and then adding them in order until no more items will fit. Even if you search the entire list each time you need to pick an item, the least cost heuristic requires at most N2 steps rather than the 2N+1 required by an exhaustive search.
Listing 4. Least cost.
Sub LeastCost ()
Dim unspent As Integer
Dim best_i As Integer
Dim best_cost As Integer
Dim i As Integer
unspent = MAX_COST
Do
' Find the next item with the smallest
' cost that will fit within the solution.
best_cost = 32767
best_i = -1
For i = 0 To NumItems - 1
If Not BestUse(i) And Cost(i) <= unspent _
And Cost(i) < best_cost Then
best_i = i
best_cost = Cost(i)
End If
Next i
' If no more items will fit, we're done.
If best_i < 0 Then Exit Do
' Add the item to the solution.
BestUse(best_i) = True
BestProfit = BestProfit + Profit(best_i)
unspent = unspent - Cost(best_i)
Loop
End Sub
Its a jungle out there
Program HEUR on the Developers Disk demonstrates exhaustive search, branch and bound, hill climbing, and the least cost heuristic. Be careful not to use exhaustive search on large problems. Try the program with several different ranges for costs and profits. Youll find that a least cost strategy works well when the range of profits is small. Hill climbing generally works well when the range of costs is small.
For any given problem, you can come up with many other heuristics. You might try a greatest cost strategy to try using up all of the spending allowance. You could pick random items to add to the solution until no more will fit. Since different trials using a random solution will give different results, you might examine several hundred or even several thousand random solutions.
For different data sets, different heuristics will be the best. Since these algorithms tend to be extremely fast, you should run as many as possible and pick the solution that is the overall winner. While you may not get the best result possible, you should get a solution that is "good enough" to be useful. Happy heuristics!
Rod Stephens is president of Rocky Mountain Computer Consulting Inc., a custom software firm in Boulder, Colorado. His book Visual Basic Algorithms describes dozens of algorithms like these. He is currently working on a book about graphics in VB. CompuServe 102124,33.
To find out more about Visual Basic Developer and Pinnacle Publishing, visit their website at: http://www.pinppub.com/vbd/
Note: This is not a Microsoft Corporation website.
Microsoft is not responsible for its content.This article is reproduced from the September 1996 issue of Visual Basic Developer. Copyright 1996, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. Visual Basic Developer is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call (800) 788-1900 or (206) 251-1900.