Moving average-based investments tested24 May 2010 15:17
I found a website which gives some advice for swing traders. Reading it, I continuously found the need to call out citation needed! Since I couldn't find any research by anyone else, I thought I would do some tests myself.
The first thing I wanted to test was relating to the use of moving averages. The simplest test I could think of was comparing holding stocks when the 10 period simple moving average (SMA) is above the 30 period exponential moving average (EMA) versus holding them continually. The advice on the website looks immediately dubious since if you look at the example chart, the stock price at the point the presumed downward trend starts is actually slightly lower than the price at the time it ends.
Choosing companies to test
Selecting some companies to test the strategy with wasn't easy. Since if I do any investment it is likely to initially be in London, I wanted to test with London companies. The FTSE-100 is an obvious choice, but testing with the current FTSE-100 companies biases towards those which have been successful over the testing period. Surprisingly, there doesn't seem to be a source which will give a list of companies in the FTSE-100 at a given date. There is a list of FTSE 100 constituent changes, which looks slightly suspicious as it shows no changes since 2008; it also doesn't give the symbol for companies, making it harder to look them up.
The LSE also provides an archived list of all companies, which also fails to give any symbols.
For the sake of getting something done, I downloaded from Yahoo finance data for the current FTSE 100, and then removed any companies for which I didn't have data from January 2003 to March 2010. This left me with 70 stocks. This is clearly still biased, but probably not in a way which would significantly affect this experiment.
I compared the result of putting £1 into shares of each of the companies and leaving it there with a strategy based on only holding shares when they were supposedly trending upwards.
My moving averages were calculated on closing values. If the short-term moving average was above the long-term average and my money for that stock was currently in cash, I converted it to shares at the next day's opening price. Conversely, if the short-term average moved below the long-term average and I was holding shares I sold at the next day's opening price.
A naive approach, (even ignoring stock splits) gave the following results: Holding the shares from January 2003 until March 2010 would give you £168 from your £70 investment. Following the trends would return you £113.
The site suggests only holding stocks when the trends are strong and the averages are well separated. So I re-ran the simulation holding stocks only when the averages were separated by 1% and 0.5%. For 0.5% I got an amazing return, and decided to investigate.
After a bit of digging, I found data which looked like this:
2003-06-23,446.25,455.00,438.25,444.25,3532800,339.30 2003-06-20,457.50,462.25,448.25,452.00,4408700,345.22 2003-06-19,4.75,477.00,456.75,458.25,9434500,349.99 2003-06-18,472.25,476.00,463.75,475.50,9533100,363.17 2003-06-17,459.50,474.50,457.00,470.00,10067300,358.97
On the middle day, the opening value is only 1% of what it should be. Ahem. That's not useful data, and obviously wrong since the opening value is less that the minimum. Time to look at the data: how often are the opening and closing values not between the maximum and minimum? 1400. Oh dears.
Okay, how many are more than 2% out? 135.
How many are more than 5% out? 45. But only two of these fall after the start of 2005. They are these values:
('GSK.L', Day(date=datetime.date(2009, 7, 28), open=1168.5, high=1182.0, low=1167.0, close=1000.0, volume=1731000.0, adjclose=1000.0)) ('RBS.L', Day(date=datetime.date(2008, 9, 19), open=238.0, high=252.5, low=227.0, close=213.5, volume=298901500.0, adjclose=213.5))
For GSK it appears that the close value is wrong. Substituting in the low value probably wouldn't be too bad. For RBS it's harder to see what's happened. The next opening was 215, so it's not impossible that the closing value is correct and it's the low that's wrong.
For the current simulation this isn't going to be a big factor, but the idea that the data are this flakey casts massive questions over any simulation results. I also have no idea whether there are other significant errors which aren't caught by the above check.
Other data sources
Google's historical data seems pretty similar to Yahoo's. I subscribed to Reuters DataLink in order to get better data, but after persuading a Windows user to allow me to install the client application on her computer we got lots of errors about unknown ticker symbols and failed to get any data at all. The fact that you have to use the standard client makes the data source pretty useless anyway, but it would be nice to see whether someone has accurate historical data. Anyone know where Yahoo's data comes from?
Plotting what happened
I was surprised by how hard it is test even simple code to do data analysis. Even when you're confident that all the components you have work as expected it's still easy to connect them together wrongly, and once they're connected together it's pretty hard to verify that they're working correctly. I have an idea that using a stepped function as input data it should be possible to manually predict the expected output for moderately complex functions, but I haven't implemented this, and it still wouldn't be an obviously-correct test.
In order to try to visualise what is going on, I decided to plot the stock values and when the algorithm is holding them. Here (warning: 15MB!) is the format I came up with for the S&P 500 (see later for why). The value of each line is the log of the percentage movement of each close value from the previous day. The three shades of blue indicate what is held by the straight-forward algorithm, and also the 1% difference and 2% difference variants of it. I haven't worked out how to add a variable horizontal scale yet; this graph is from March 2002 to May 2010.
One weakness is that here we just plot closing values (which we make out investment decisions based on), not opening values (which are the prices we actually pay).
Using the charts it was easy to spot anomalies in the data. Here are a couple:
Here's one where the adjusted close has stock split in wrong ratio, a good reason not to rely on Yahoo's adjusted values: http://finance.yahoo.com/q/hp?s=ULVR.L&a=04&b=16&c=2006&d=04&e=25&f=2006&g=d
Toby at Timetric helpfully pointed out that Yahoo's US price data seems much better. I downloaded prices for the S&P 500. There do seem to be fewer errors, but there definitely still are some. For example, here's a couple of examples where the stock split ratio seems to have been applied to a couple of dates before the split.
In fact, this pattern seems to occur on about eight stock splits from 2001. However, there don't seem to be an instances after that. I decided to work with S&P 500 data, from 2002 onwards.
Stock splits data
In order to do any accurate simulation I need data about stock splits. I think Yahoo has this data pretty accurately, but getting hold of it is a pain. It appears in at least three places:
- On the bottom of the standard graph of stock prices
- In the table of monthly prices, though not in the CSV version, so that's not good
- In the table of dividend data, but this data only appears when there is at least one dividend paid, and again not in the CSV version
None of these are very useful. I ended up parsing the human-readable page. Parsing that wasn't too bad, except that the 1960's (!) splits are of the form %Y-%m-%d instead of %b %d, %Y. Since some of the ratios are 102:100 I'm not even sure these are likely to be right. Anyway, after cleaning up this data I finally got to re-plot my graph and to my surprise it cleaned up most of the 'icicles' (the appearance of a sudden price drop) first time.
One exception was a 2:1 stock split of Dean Foods (DF) before trading on 24 April 2002. However, this seemed to be the only obvious split which was missing.
Downloading the dividend data was straight-forward. Checking it was a little trickier. After an anticipated dividend is paid, one would expect that the share price would have fallen by that amount, but I couldn't find clear examples of this. Instead I thought I'd first check that I agreed with Yahoo's adjusted close calculations.
Eg 1. AVB, 2008-12-24, dividend of $2.7 per share, closing value 57.6, previous close 60.63. So the real decrease in value is 60.53-57.6-2.7 = 0.23. 0.23/60.63 = 0.38% fall. Yahoo's adjusted close moved from 54.03 to 53.81; that's 0.41%. Probably near enough.
Eg 2. AIG, 2008-09-03, dividend of $4.4 per share; wow! Closing value 22.58, previous 21.96. [Closing value two weeks' later was 2.05!] Real increase = 22.58+4.4-21.96 = 5.02. 5.02/21.96 gives a change in value of 23%. Nice! But Yahoo's adjusted close values went from 434.80 to 451.60, an increase of 3.9%. Hmmm.
According to a comment on a blog post, the dividend was $0.22. Working with that: Increase = 22.58+0.22-21.96=0.84. 0.84/21.96 = 3.8% change. Much nearer.
So what's going on here? Why does Yahoo's data on the dividend amount seem to be out by a factor of 20? Well, the ratio between the adjusted close and actual price at that point is about 20. Can that be it? If so, argh! It does appear to be, but most of the dividends are round figures, so they don't look adjusted.
As an attempt to get some definitive info I went to the AIG website. It was down. You don't get much for $85000000000 these days. I went to Google finance. The data is the same as Yahoo's. But then I realised that Google adjust all their historical price data instead of just having an adjusted close column as Yahoo do!
Okay, at this point I decided that the dividend payments don't make enough difference to the comparison of strategies to worry about for now. I will fix it up, but not yet.
Incorporating the stock split data into my simulations, I got the following results for S&P 500 companies between 20 March 2002 and 21 May 2010:
There are 458 companies in the current S&P 500 for which I have data for this time period.
- Holding $1 in each would return $947
- Holding when SMA10 > EMA30 would return $649
- Holding when SMA10 is 1% greater $591
- Holding when SMA10 is 2% greater $545
It would be nice to have some data in each case about the how much of the time the money was in stocks. You might expect that if the efficient market hypothesis were true then the proportion of gain/loss that each strategy has compared to just holding the stocks is proportional to the time for which they are held.
It might make sense to invest a fixed amount each time we re-invest in a stock, rather than allocating a set fund to each stock and re-investing whatever returns we got previously, but making sure we don't use too much capital would be a good idea.
Working with the S&P 500 from 2002 instead of those companies which are in it now and then would reduce some bias.
Clever strategies could be employed to calculate at what value the averages would cross and buy during the day if that value is reached.
The code for this project should be tidied up so that it can be made public.
Leave a comment