Thoughts, ramblings, and musings

In this post, I show that search-based identification of peer firms yields groups which transcend traditional industry classification codes and use stock return data to show that they are more economically related. This indicates that investors may be better off using more novel means of finding comparable companies than relying solely on traditional Standard Industrial Classification (SIC) codes. Building upon my last post, SEC Filings Graph, which explored logs from one year of searches on the SEC’s EDGAR website to identify related companies, I expand the data to searches from January 2008 through March 2017 and implement quantative measures of economic relatedness between the firms within a peer group. The resulting dataset (previous dataset) has **407,574,804 (52,611,188) searches for 9,583,773 (5,388,574) reports by 3,043,255 (476,827) users**. My work is inspired by, and roughly follows, the methodology of Lee et al. who studied search data for 2008 through 2011.

Graphs let us model the world as it is, with nodes that represent physical entities and edges connecting nodes based on the relationships between them. Inspired by the work of Lee et al., I took the Thanksgiving weekend to build a graph of SEC filings. In this post, I use the EDGAR Log File Dataset, from April 2016 through March 2017, and Neo4j to build a graph of **52,611,188 searches for 5,388,574 reports by 476,827 users**. The graph leads to interesting connections between firms and could be extended to include more relationships and entities from other domains such as textual similarity of filings, employment histories of material employees, and patent applications.

The research I’m working on involves estimating a firm’s probability of default over a variety of time horizons using the Merton Distance to Default model. The dataset contains daily financial information for more than 24,000 firms over the past 30 years. Given that I am calculating the probability of default over five time horizons, applying the Merton model will require solving the Black-Scholes equation roughly 305 million times. Luckily, the model is easily parallelized because the only data needed for the model, aside from the risk-free rate, is firm specific. This post shows how the Python library Pywren can leverage AWS Lambda to run hundreds of models in parallel, achieving a **270x speed-up** over a quad-core i7-4770, with minimal changes to the simulation code. If you are interested in learning more about the model, see my post about implementing the model in Python.

One of the most effective methods for rating credit risk is built on the Merton Distance to Default model, also known as simply the *Merton Model*. While implementing this for some research, I was disappointed by the amount of information and formal implementations of the model readily available on the internet given how ubiquitous the model is. This post walks through the model and an implementation in Python that makes use of Numpy and Scipy.

One challenge facing companies that manufacture products with highly seasonal demand is the choice of production scheduling. One option is to make a fraction of the expected demand each month leading up to the peak season. But, when demand is uncertain this approach can lead to large overages or shortages that erode margins. Firms that find themselves in this situation must balance the costs of ramping production to meet demand during peak season, additional inventory holding costs, and the risk of uncertain demand.

The case of Play Time Toy Company^{1} is used to explore solutions to this dilemma. Specifically, using Monte Carlo analysis to assess the impact of production scheduling on firm profitability and capital requirements.

This post examines how the formal structure set by managers can differ drastically from its informal structure which dictates how information flows within the company.

The need to make trade-offs between the effort exerted on specific activities is felt universally by individuals, organizations, and nations. In many cases, activities are mutally-exclusive so partaking in one option excludes participation in another. Deciding how to make these trade-offs can be immensely difficult, especially when we lack quantitative data about risk and reward to support our decision.

In the case of financial assets, there is a wealth of numerical data available. So in this post, I explore how historical data can be leveraged to choose specific mixes of assets based on investment goals. The tools used are quite simple and rely on the mean-variance of assets’ returns to find the efficient frontier of a portfolio.

*Note: The code and data used to generate the plots in this post are available here.*