Last week I said I would discuss the language and my thoughts on Julia for the ULISSES Project. Related to this, I would like to explain why our new company will be using Julia for certain projects. This is in contrast to the ULISSES Project, that will continue to use the courses that Hal Tolley and I developed purely in Python. I will break down the reasons of this divergence from Python to Julia into three areas:
- Unification of Parameterized Codebase
- Speed of running Unified Models
- Momentum of Adoption—the S-Curve
Unification of Parameterized Codebase
Complex modeling began in two main areas, macroeconomics and weather prediction. In these areas the users initially attempted to build complex models, but soon they realized that the limitations of both hardware and software would require a parameterization of those models. Parameterization is a concept that is so common now that all but high-level subject matter experts believe that it is actually the way things were meant to be. In short, at the beginning of modeling these areas, all the experts knew that complex systems needed to be built to accommodate closer representations of reality, something that would inevitably result in more accurate prediction.
While scripting languages like Python accommodate “fast and dirty” research code, they do not accommodate complexity well. Among the reasons for this is the permissive typing permitted in languages such as Python, which in effect guarantees that there will be problems in areas that require extreme precision, which complex modeling does. Julia is strong typed natively and is built to accommodate computationally complex mathematical modeling. This is precisely why more and more macroeconomists and weather forecasters are moving to Julia. This is manifested via movements such as CLIMA and new tools such as JuMP. In short, these groups in macroeconomics and climate research are unifying long-standing code bases in languages such as FORTRAN and Matlab in Julia.
Speed of running Unified Models.
In the past, compromises were made in the speed because it was always believed that processors would continue to increase in speed by 20-40% per year, but we are now approaching an entire generation after the cessation of Moore’s Law, while we still see programmers writing extremely inefficient code. Nothing seems to accommodate inefficient code better than Python. In general, this is fine with research code, but with production code in areas such as finance, weather prediction and epidemiological research time matters. Seconds can make the difference between life or death and profit and loss. Interpreted languages such as Python are easily to program, but take massive amounts of time to run, with lower-level code able to run orders of magnitude faster. I mentioned this in one of my earlier blog entries, but the interesting thing is that Julia does not suffer from those same problems.
While it takes quite a while for Julia to compile, once Julia code is compiled it runs at a comparable speed to C. This is precisely why groups such as CLIMA are looking to unify long standing codebases in Julia, not only to make control and update of codebases easier, but in order to have mission critical models run faster. In areas such as weather forecasting, it takes supercomputers up to twelve hours to run simulations, in effect making predictions of limited value and requiring parameterized kludges. With unified codebases in Julia, researchers look forward to the hope of true real-time forecasting, something that can not only be lucrative in areas such as finance, but could save lives in areas such as epidemiology and weather forecasting.
Momentum and Adoption – the S-Curve
Herbert Simon, the great pioneer of AI, called the area “complex information processing.” While this means many things, one of the things it definitely means is understanding that the world is non-linear and rather than things happening in a slow and gradual manner, they often happen with initial build-up that is almost indistinguishable and then hit a time of exponential growth. We saw this with Python code adoption, when a relatively small hedge fund, AQR, unified their codebase in Python and one of their researchers, Wes McKinney, subsequently published to the world his panel data library, Pandas. Not to take away from the work of Python, but this and many other “innovations” were simply porting well established tools and libraries from R. The rapid adoption of Python for quick and dirty code came about simply because it was the right thing at the right time, meaning an open-source tool with all of the functionality of R, but the added benefit of being a general purpose language.
My strong belief is that we are at the end of the S-Curve adoption of Python. Too many people have used it for the wrong purpose. Two multi-billion-dollar failures to unify codebases in it have already occurred at two of the largest banks and while they often describe it as “a success,” only referring to the sheer budget spent as indications, the performance and their lagging in technological leadership clearly demonstrate the failures. In short, Python is very good for some purposes and very bad for others. We see a complex future and that future belongs more to Julia, and we are betting on it.
Next week I will start a short set of entries on complexity modeling that we will likely begin to see emerge in finance.