# python

## Writing a Scientific Software Package: Open-Sourcing

UPDATE (2018 Jun 6) – I’m trying to learn how to use Sphinx to create documentation from source-code docstrings. The Sphinx manual is impressively opaque. Some googling turned up this document, which seems to provide useful details.

UPDATE (2018 Jun 4) – I’ve found some new resources that are much more up-to-date:

To set up my package so it can be installed via pip, I’m following the somewhat outdated tutorial here – https://python-packaging.readthedocs.io/en/latest/index.html.

I’ll make more notes and update this post as I go along.

## Writing a Scientific Software Package: Configuring vim

As the second entry in my series on learning how to write scientific software, I’m going to describe choosing and configuring my integrated development environment or IDE. This is the program I’ll use to write and edit the source code for my project. It’s more-or-less a fancy text editor.

My text editor of choice is vim or vi improved. It’s highly customizable, powerful, and there’s lots of online help to use it. However, it has a very steep learning curve and the commands, while fast to type out, can be very cryptic. In fact, vim is famously difficult to exit.

In any case, I started using vim back when “Pirates of the Caribbean” movies were still good, so while I’m no vim guru, I feel pretty comfortable at least exiting the program.

Opening a terminal window in Mac, I fire up vim just by typing “vim” or “vi”. That opens an editor window. vim can automagically interpret and color source code, as follows:

A screenshot from vim showing syntax-colored source code.

To turn on syntax coloring, go to your home directory (type “cd” in the terminal window), and edit the vim configuration file .vimrc by typing “vi .vimrc”. That should open an editor window.

Then, in vim, type the letter “i” (that starts “insert” or edit mode, allowing you to enter text into the file) and type “syntax on” <ENTER> “filetype indent plugin on”, giving a file that looks like this:

Press the escape key (which exits insert mode and enters command mode). Save the changes by typing “:wq”. That should save and exit.

Unlike many other languages, Python considers whitespace in its interpretation, and the Python style guide recommends using four spaces for each level of indentation. It would be nice to have the tab key implement that spacing in vim. Ynfortunately, my vim by default inserts eight spaces for each press of the tab key.

But you can modify that behavior by adding file-type plugin files to a special vim configuration folder. I followed the instructions here to create an ftplugin directory inside the .vim directory (by typing “cd && cd .vim && mkdir ftplugin && cd ftplugin” in the terminal window). Then, inside the directory, I created python.vim (“vi python.vim”) and again pressed “i” to enter insert mode.

I typed the following lines into the file:

Then pressed the escape key and typed “:wq”.

Next, I tested the new configuration by typing “vi test.py” (the “.py” is important because that’s how vim knows you are editing a python file and want to use the new python configuration). I pressed tab and got four spaces instead of eight.

I’m sure there are other configuration settings that would be useful, but this’ll do for now.

UPDATE – 2018 May 17: I found this excellent website – http://docs.python-guide.org/en/latest/, which addresses vim set-up, as well as a number of other issues.

## Writing a Scientific Software Package: Motivation and Background

Ancient software developers meticulously punched holes in paper cards to write programs.

In the eons before my graduate career, scientists rarely, if ever, publicly distributed their codes, with authors zealously guarding their coding projects.

But just as I was finishing my PhD, it was becoming common for scientists to make the code they developed as part of a published project readily available on the internet. However, the methods to post code online (at least those I knew about) were pretty clunky.

Nowadays, the infrastructure for posting and sharing code online is robust, mature, and relatively easy to use. Consequently, scientists are creating beautiful code repositories, along with accessible documentation.

Open-sourcing code is becoming ever more important: as codes become more complex and capable, readily available codes with good documentation are critical to support reproducibility, a cornerstone of the scientific process. Moreover, federal funding agencies are starting to require investigators to make their code and data products public.

Unfortunately, since I was one of the last generation of grad students before these repositories were common, I never really learned how to distribute and document code properly.

So as part of an ongoing effort to improve my science output (and as an aide to my future students), I’m going to begin a series of semi-regular blog posts describing my process of learning how to write, document, and post scientific code.

A few caveats upfront:

• I intend to mostly (probably exclusively) write the code in python, which has become (at least in astronomy) the language of choice, so not all of what I write will be generally relevant.
• I was ushered into the Cult of Mac many years ago, so not all of what I write will be relevant for other OS’s. Here again, though, I’ve found anecdotally that most astronomers use Mac.
• This blog series is in no way intended to be comprehensive or rigorous. I’m just planning to describe what I learn as I go along, and what time I can devote will almost definitely not suffice to explain all the details, nuances, or technical aspects that intersect the project.

As to the actual science code I intend to write, several years ago my colleagues and I wrote a paper about ellipsoidal variations induced by massive exoplanets orbiting very close to their host stars. The accompanying code, EVIL-MC was written in IDL, an older language still widely used in astronomy but proprietary and requiring the purchase of an expensive site license.

My plan is to convert that IDL code into a Python package over the next several weeks.

EVIL-MC – Ellipsoidal Variations Induced by a Low-Mass Companion

Tidal distortion (exaggerated) of a star (orange-yellow disk) orbited by planet (white/black disk). The plot below shows the brightness variation of the star due to the tidal distortion.

## Uniform Circular Motion Animation in Python

I’m prepping for my classical mechanics course, scheduled to start next week. One of the first things we discuss is uniform circular motion and how it looks projected along the x- and y-axes, so I thought it would be useful to have an animation showing that. I found a few animations online, but none really showed the x and y projections I was looking for.

So I decided to create my own using my go-to language of choice Python. Fortunately, python guru Jake Vanderplas has created a very nice animation module usable in iPython Notebooks.

Based on his example here, I put together the following code to generate the desired animation:

## Reese’s Pieces and Bayesian Inference

From http://en.wikipedia.org/wiki/Reese%27s_Pieces#/media/File:Reeses-pieces-loose.JPG.

I eat Reese’s pieces almost every day after lunch, and they come in three colors: orange, yellow, and brown.

I’ve wondered for a while whether the three colors occur in equal proportions, so for lunch today, I thought I’d try to infer the occurrence rates using Bayes’ Theorem.

Bayes’ Theorem provides a quantitative way to update your estimate of the probability for some event, given some new information. In math, the theorem looks like

$latex P\left( H | E \right) = \dfrac{ P\left( E | H \right) P\left( H \right)}{P\left( E \right)},$

The probability for event $latex H$ to happen, given that some condition $latex E$ is met, is the probability that $latex E$ is met, given that $latex H$ happened, times the probability for $latex H$ to happen at all, and divided by the probability for $latex E$ to be met at all.

The $latex P(H)$ and $latex P(E)$ are called the “priors” and often represent your initial estimates for the probability that $latex H$ and $latex E$ occur. $latex P\left(E | H \right)$ is called the “likelihood”, and $latex P(H | E)$ is the “posterior”, the thing we know AFTER $latex E$ is satisfied. $latex P(H | E)$ is usually the thing we’re trying to calculate.

So for my case, $latex P(H)$ will be the frequency with which a certain color occurs, and $latex E$ will be my experimental data.

For a given frequency $latex f_{\rm orange}$ of oranges (or browns or yellows), the probability $latex P(f_{\rm orange} | E)$ that I draw $latex N_{\rm orange}$ oranges is  ~ f^N (1 –  f)^N(not orange). As I select more and more candies, I can keep re-evaluating $latex P$ for the whole allowed range of f (0 to 1) and find the value that maximizes $latex P$.

Closing my eyes, I pulled ten different candies out of the bag, with following results in sequence: brown, orange, orange, yellow, orange, orange, orange, brown, orange, yellow, orange. These results obviously suggest orange has a higher frequency than yellow or brown.

This ipython notebook implements the calculation I described, and the plots below show how $latex P$ changes after a certain number of trials $latex n_{\rm trials}$:

Applying Bayesian inference to determine the frequency of Reese’s pieces colors.

So, for example, before I did any trials $latex n_{\rm trials} = 0$, I assumed all colors were equally likely. After the first trial when I chose a brown candy, the probability that brown has a higher frequency than the other colors goes up. After three trials (brown, orange, orange), orange takes the lead, and since I hadn’t seen any yellows, there’s a non-zero probability that yellow’s frequency is actually zero. We can see how the probabilities settle down after ten trials.

Based on this admittedly simple experiment, it seems that oranges have a frequency about twice that of yellows and browns. Although not as much fun, if I’d bothered to check wikipedia, I would have seen that “The goal color distribution is 50% orange, 25% brown, and 25% yellow” — totally consistent with my estimate.

## Labeled scatter plot using Pylab

A preliminary investigation showed a project a colleague and I were considering probably isn’t worth doing. But for that investigation, I took a few hours to make a rather complicated plot using pylab, so I thought I’d share how I did that.

First, here’s the plot:

Tidal decay timescales for members of multi-planet systems.

The plot shows the timescales for tidal decay of members of multi-planet systems. Unfortunately, the x-axis labels aren’t legible unless you zoom in, but if you do, you can see the font colors match up with the corresponding line colors.

Below is the ipython notebook I used to generate the plot. The excel spreadsheet with the data is here.

#Show the plot inline
%matplotlib inline

import pandas as pd
import pylab as pl
import itertools as it
import numpy as np

# using the ExcelFile class
xls = pd.ExcelFile('exoplanet-archive_2015Mar25.xlsx')
data = xls.parse('obj of interest', index_col=1)
data = data[pd.notnull(data['a/(da/dt)Qs=1e6 (Gyrs)'])]

#Make a nice, big figure
fig = pl.figure(figsize=(15,15))

#Make a list, indexing the dataframe labels
indices = range(len(set(data.index)))

#Make a list with the indices as the entries
labels = list(set(data.index))
#For concision, drop "Kepler" wherever it's found
labels = [w.replace('Kepler-', '') for w in labels]

#Make a cycle of line and text colors, blue, green, red, etc.
colors = it.cycle(['b', 'g', 'r', 'c', 'm', 'y', 'k'])

#Since each member of the multi-system should be plotted with the same x-value,
#  I need to generate a new list of all the same value with as many entries
#  as members. That's what "i" is for.
i = 0
for unq in set(data.index):

#Retrieve the decay timescales calculated in the spreadsheet
taus = data.loc[unq, 'a/(da/dt)Qs=1e6 (Gyrs)']

#Generate the list of all the same x-value
idx = np.ones_like(taus)*i

#Make the scatter plot points with the current color
ax.semilogy(idx, taus, marker='o', color=cur_color)

#Get the next line color
cur_color = next(colors)

#Next x-value
i += 1

#Give a little space to the left and right of the first and last x-values
pl.xlim([-1, len(set(data.index))+1])

#Switch out the x-values with the system names
pl.xticks(indices, labels, rotation='vertical', size='small', ha='center')

#Increase the size of the y-axis label font
pl.yticks(size=36)
#Label the y-axis
pl.ylabel('$a/\\left(\\frac{da}{dt}\\right)_{Q_{\\rm s} = 10^6}$ (Gyrs)', fontsize=36)

#Reset the colors cycle
colors = it.cycle(['b', 'g', 'r', 'c', 'm', 'y', 'k'])

#Set a new color for each x-axis label
for tick in ax.xaxis.get_ticklabels():
tick.set_color(cur_color)
cur_color = next(colors)

pl.savefig('Comparing multi-planet system a_dadt.png', bbox_inches='tight', orientation='landscape', dpi=250)


## Naming Pluto’s Surface Features and more

Artist’s conception of Pluto’s and Charon’s surfaces. From http://www.ourpluto.org/home.

We talked briefly about several things at Friday’s Journal Club. First, we discussed astrobites.org, a great blog that covers the interesting nitty-gritty of astronomy research. I pointed out that they are requesting submissions from undergrad researchers.

Second, we discussed the New Horizons mission’s request for suggestions for names of features on the surface of Pluto and its moons. After the mission flies by the system, there will be mounds of high resolution images, probably showing a variety of complex surface morphologies. And all that stuff is going to need names.

Third, Jacob presented a recent paper that extends the Titius-Bode relation to extrasolar systems and predict there are about 2 planets in habitable zones per star in our galaxy. A potentially fascinating result, but unfortunately, the T-B relation is probably just an interesting coincidence for our solar system — it has no theoretical basis, and so there’s no reason to believe it can be generalized to other planetary systems. Nevertheless, the article got a lot of press last week.

Finally, we talked about coding in astronomy, and I wanted to post this resource I just heard about, https://python4astronomers.github.io/. Looks to have a lot of helpful tutorials relevant to astronomy.

Friday’s attendees included Jennifer Briggs, Trent Garrett, Nathan Grigsby, Tanier Jaramillo, Emily Jensen, Liz Kandziolka, and Jacob Sabin.