Last last week I happened upon a headline on dzone.com that caught my attention: a Monte Carlo experiment. The article featured a project from developer Alosh Bennett who used java to create a Monte Carlo web app to allow a user to use a number of random plots and to see how the result compared with PI. I thought that replicating this app in Microsoft Excel might be a great way to pass the time and learn something over the weekend.
There's a lot to the project so my intent is to give a very high level storyline of the project. If you have questions on details feel free to reach out.
I started with setting up some basic XY data and a chart to begin modeling what I wanted the finished product to look, I realized that although not needed, having a circle plotted on the grid would be a nice visual to understand the boundary between the random plots that are in or out of the circle, and equivalent to Alosh's chart.
After completing the first XY scatter plot chart I realized that I needed a couple chart series to setup the plots that are colored depending on whether they were in or out of the circle (actually the distance of the plot from the center of the circle 5000, 5000).
If you want to follow along, download the worksheet and look at Sheet3 where I preserved the logic that is all driven by Excel formulas. I used the RANDBETWEEN function to populate the x and y axis of the scatter plot. In our case we need to populate each cell in the x and y columns with '=RANDBETWEEN(0, 10000)'.
Then you need to calculate the distance between the plot represented by the random xy plot from the center of the circle, 5000, 5000 in my case. I had to look up the formula ... The formula is distance = SQRT((Y2-Y1)*(Y2-Y1) +(X2-X1)*(X2-X1)) . In our case the Y2 and X2 is 5000 so our formula looks like distance = SQRT((5000 - Y1) * (5000 - Y1) + (5000 - X1) * (5000 - X1)) . We can shorten and make it suitable for Excel by raising each to the 2nd power and to get the square root in excel you need to raise to the 1/2 power. This looks like '=((5000-D8)^2+(5000-C8)^2)^0.5'.
Now that you have a column of numbers in Sheet3 column E, the numbers smaller than 5000 are inside the circle and larger are outside. But if we charted only the xy plots from column C and D we don't have a clear path to create the two colored chart like Alosh's. So I added columns F G and H I for the purpose of creating two uniquely colored plots. To populate each I need to evaluate whether the distance, in column E, was greater or less than 5000 ... pretty simple. The formula looks like '=IF(E11<5000, C11, NA())'. That along with the circle gives you equivalent of Alosh's chart.
The final output of Alosh's app is the figure tells us the ratio of plots inside vs outside the circle. Since we know the plots in or out we can calculate the last figure that is is otherwise PI in the formula.
Cool. This was great but I quickly found out that doing trials with big numbers became slow. I like fast machines so I proceeded to do the math in VBA and place the values on to the worksheet. The VBA version is Sheet1.
Before I was finished this project I made a mistake in a formula and it was giving me output that wasn't quite equivalent to Alosh's app. So I thought that maybe there was an issue with Excels random numbers. In the end I found that I had made a mistake in my formula however now I have a side by side random integers generated by Excel and Python so I could test the difference.
Anyway, I cloned Sheet1 to Sheet2 where I modified the VBA to reached out to Python for the random numbers instead of creating the numbers in Excel. To do this I used xlwings, an open source project started by Felix at Zoomer Analytics together with Python (Anaconda distro v 3.5) to retrieve random numbers from Python. Xlwings has two components: an Excel add-in and a Python module. Go to http://xlwings.org to get the Excel download and instruction to configure. Regardless of the Python you are using you will need to install the xlwings module using 'pip' or in my case using 'conda'. The other modules you will need in Python are Numpy (generates the random numbers) and Pandas (structures the data). All this requires a bit of configuration and you need to create a python script but once you are finished it works really great.
I'm kind of a noob at python but it's fun and easy to pick up so here's my python script :
import numpy as np
import pandas as pd
from pandas import DataFrame
from numpy.random import random_integers
from xlwings import Workbook, Range
wb = Workbook.caller()
i = Range('Sheet2', 'D3').value
df1 = pd.DataFrame(random_integers(0,10000, i))
df2 = pd.DataFrame(random_integers(0,10000, i))
Range('Sheet2', 'C8', index=False, header=False).value = df1
Range('Sheet2', 'D8', index=False, header=False).value = df2
If you download the python script from here you will need to remove the '.txt' that Drupal added during the upload for security.
It's pretty straight forward if you have munged around any code .. and you will note that the Python xlwings module allows for a great deal of similarity in conventions when exchanging data between Excel and Python.
I'm going to stop for now and will finish this post later. Let me know what you think about this project.