IBKR Quant Blog


1 2 3 4 5 2 27


Quant

qplum - Use of Hypothetical Data in Machine Learning Trading Strategies


Join us for a free webinar on  Thursday, September 20, 2018 12:00 PM

 

Register

 

qplum - Use of Hypothetical Data in Machine Learning Trading Strategies

Over the last decade deep learning has had tremendous success in pushing state of the art in numerous domains such as computer vision, natural language processing, machine translation and speech recognition.

All of these domains are characterized by large quantities of data. In finance, however, even 20 years of end-of-day data is merely 5,000 points and any data-driven trading strategy is only as good as the data itself. In order to leverage deep learning research to the fullest, many progressive asset managers are experimenting with different approaches to generate and use hypothetical data so that the models can learn what to do in scenarios that the markets haven't seen yet. In this webinar, we will discuss use of synthetic/hypothetical data that can potentially solve this problem.

 

 

Speaker:    Ankit Awasthi, Quantitative Portfolio Manager at qplum

 

Sponsored by:   qplum

 

Information posted on IBKR Quant that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Quant are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.


19472




Quant

Contemporary Portfolio Optimization Modeling with R


Contemporary Portfolio Optimization Modeling with R

In case you missed it! The webinar recording is on IBKR’s YouTube Channel.

https://www.youtube.com/watch?v=rZcPSGoSnDQ

R

 

Ronald Hochreiter, Docent/Lecturer, WU (Vienna University of Economics and Business) presents the following:


In the first part of this webinar, we will review the most common ways to conduct the task of portfolio optimization with R. After this introduction, we will address some remarks on the modeling of portfolio problems. In the second part, we will demonstrate a revolutionary way to model and solve portfolio optimization problems using R. The basic idea of conceptualizing a new way to model portfolio optimization problems is to build a portfolio optimization modeling language on top of a generalized algebraic modeling language. By focusing on several modeling and optimization approaches, the webinar is designed to provide new insights for a broad range of interested parties.

 

 

Information posted on IBKR Quant that is provided by third-parties and not by Interactive Brokers does NOT constitute a recommendation by Interactive Brokers that you should contract for the services of that third party. Third-party participants who contribute to IBKR Quant are independent of Interactive Brokers and Interactive Brokers does not make any representations or warranties concerning the services offered, their past or future performance, or the accuracy of the information provided by the third party. Past performance is no guarantee of future results.


19623




Quant

An Introduction To Trading: Developing A Conceptual Framework - Top Programming Languages


In a recent QuantInsti article, Gaurav Raizada and Varun Divakar shared their observations on the top programming language in developing a conceptual trading framework.

Machine Learning

Machine Learning has become an important part of the list of skills a Quant Analyst should have. The majority of the strategies used in trading including technical, quantitative or fundamental can be automated and optimized. To optimize the code, you need to have a strong understanding of the language in which you are coding. To optimize the strategy, you need to have a strong understanding of the features of the strategy and the necessary machine learning model that would be suitable for the problem at hand. There are good Machine Learning algorithms in the market that are capable of understanding the sentiment in the markets using Twitter feeds or other similar news feeds. Incorporating such algorithms will give you, as a trader, a significant edge over other traders.

 

Python

 

Python/R/C++

Which language to learn? This is probably a question which haunts every beginner in the field of Quantitative trading. If you know where a particular language is used then you will be able to understand which language suits your needs better.

C++ is used extensively in strategies where the time of execution is the most important parameter. For example, in HFT (high-frequency trading) where the trades are completed in less than a second, the language you choose can make or break your strategy. In such scenarios, C++ is the best option.

Python or R have extensive usage in the field of analytics and finance. They are used in algorithmic trading and almost every major broker in the world has an API supporting at least one of them. There is nothing to choose from either of them; they both provide similar features and libraries and are both open source. R is a well-established language in the field of finance and Python is a relative newcomer. However, the popularity and usage of Python has been growing exponentially.

 

Read the full article on QuantInsti website:

https://www.quantinsti.com/blog/introduction-trading/

 

 

Visit QuantInsti website and the educational offerings at their Executive Programme in Algorithmic Trading (EPAT™).

This article is from QuantInsti and is being posted with QuantInsti’s permission. The views expressed in this article are solely those of the author and/or QuantInsti and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.


19514




Quant

K-Means Clustering: All You Need to Know - Part 2


By Uday Keith, Byte Academy

In the previous post in this series, Uday presented an overview of K-Means Algorithm.

 

Choosing the “right” K

Elbow Method

The elbow method allows us to make a decision on a value of K via visual aid. We will try to break up our data into a different number of K clusters and plot each K cluster-type against the corresponding W(Ck). An example is below.

We choose the value of K at the position when the decrease in the W(Ck) for values of K begins increasing. So, for the example below, the optimal K appears to be 2 since the decrease in W(Ck) between K = 1 and K=2 is larger then drop in K = 3 and K =2. So, we visually look for the “elbow” of the curve.

 

Elbow Method Python

 

 

Silhouette Method/Analysis

Silhouette analysis can be used to study the separation distance between the resulting clusters. The silhouette coefficient displays a measure of how close each point in one cluster is to points in the neighboring clusters and thus provides a way to assess parameters like the number of clusters. This measure has a range of [-1, 1].

Silhouette coefficients (as these values are referred to as) near +1 indicate that the sample is far away from the neighboring clusters. A value of 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters and negative values indicate that those samples might have been assigned to the wrong cluster.

The Silhouette Coefficient is calculated using the mean within-cluster distance/variation (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient for a sample is (b – a) / max(a, b). To clarify, b is the distance between a sample and the nearest cluster that the sample is not a part of.

There is a visual component to plot the silhouette’s you can follow at: http://scikit-learn.org/stable/autoexamples/cluster/plotkmeanssilhouetteanalysis.html

 

Conclusion

Clustering is challenging in that at the outset, because it is not clear whether the output is going to be useful. If we were to create 3 clusters or 8 clusters with a dataset, how do we know which one is the correct choice? Let us say using the Online Retail Dataset, we concluded that there are 6 customer-types or clusters. Based on this, the marketing department of the company sent out email advertisements to customers as per their cluster assignment. A clustering would be useful if customers interested in deals for electronic products actually received an email with those products. He or she would hopefully click on the advertisement and purchase an item.

Over time, we could evaluate the clustering based on the overall response of the customers on the email advertisements. Many clicks or purchases would reflect an appropriate clustering. If not, however, clearly the clustering needs to be adjusted.

This is exactly the challenge with clustering using K-means or any other method. While guides like the Elbow or Silhouette method exist, we can never be exactly sure of the validity of our clustering. Even so, K-means is a powerful and quick algorithm that if used wisely in conjunction with domain knowledge, can produce great results.

 

Any trading symbols displayed are for illustrative purposes only and are not intended to portray recommendations.
 

Byte Academy is based in New York, USA. It offers coding education, classes in FinTech, Blockchain, DataSci, Python + Quant.

This article is from Byte Academy and is being posted with Byte Academy’s permission. The views expressed in this article are solely those of the author and/or Byte Academy and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.


19567




Quant

Back to Basics: Backtesting in Algorithmic Trading


Quant

The articles in this series are available as follows: Part IPart IIPart IIIPart IVPart V, Part VI, and Part VII

 

In this post, Kris will discuss the importance of backtesting in algo trading.

Nearly all research related to algorithmic trading is empirical in nature. That is, it is based on observations and experience. Contrast this with theoretical research, which is based on assumptions, logic and a mathematical framework. Often, we start with a theoretical approach (for example, a time-series model that we assume describes the process generating the market data we are interested in) and then use empirical techniques to test the validity of our assumptions and framework. But we would never commit money to a mathematical model that we assumed described the market without testing it using real observations, and every model is based on assumptions (to my knowledge no one has ever come up with a comprehensive model of the markets based on first principles logic and reasoning). So, empirical research will nearly always play a role in the type of work we do in developing trading systems.

So why is that important?

Empirical research is based on observations that we obtain through experimentation. Sometimes we need thousands of observations in order to carry out an experiment on market data, and since market data arrives in real time, we might have to wait a very long time to run such an experiment. If we mess up our experimental setup or think of a new idea, we would have to start the process all over again. Clearly this is a very inefficient way to conduct research.

A much more efficient way is to simulate our experiment on historical market data using computers. In the context of algorithmic trading research, such a simulation of reality is called a backtest. Backtesting allows us to test numerous variations of our ideas or models quickly and efficiently and provides immediate feedback on how they might have performed in the past. This sounds great, but in reality, backtesting is fraught with difficulties and complications, so I decided to write an article that I hope illustrates some of these issues and provides some guidance on how to deal with them.

Why Backtest?

Before I get too deeply into backtesting theory and its practical application, let’s back up and talk about why we might want to backtest at all. I’ve already said that backtesting helps us to carry out empirical research quickly and efficiently.

In the world of determinism (that is, well-defined cause and effect), natural phenomena can be represented by tractable, mathematical equations. Engineers and scientists reading this will be well-versed for example in Newton’s laws of motion. These laws quantify a physical consequence given a set of initial conditions and are solvable by anyone with a working knowledge of high school level mathematics. The markets however are not deterministic (at least not in the sense that the information we can readily digest describes the future state of the market).

Backtesting on past data could help provide a framework in which to conduct experiments and gather information that supports or detracts from a conclusion.

Backtesting accuracy can be affected by:

  • The parameters that describe the trading conditions (spread, slippage, commission, swap) for individual brokers or execution infrastructure. Most brokers or trading setups will result in different conditions, and conditions are rarely static. For example, the spread of a market (the difference between the prices at which the asset can be bought and sold) changes as buyers and sellers submit new orders and amend old ones. Slippage (the difference between the target and actual prices of trade execution) is impacted by numerous phenomena including market volatility, market liquidity, the order type and the latency inherent in the trade execution path. The method of accounting for these time-varying trading conditions can have a big impact on the accuracy of a simulation. The most appropriate method will depend on the strategy and its intended use.
  • The granularity (sampling frequency) of the data used in the simulation, and its implications. Consider a simulation that relies on hourly open-high-low-close (OHLC) data. This would result in trade entry and exit parameters being evaluated on every hour using only four data points from within that hour. What happens if a take profit and a stop loss were evaluated as being hit during the same OHLC bar? It isn’t possible to know which one was hit first without looking at the data at a more granular level. Whether this is a problem will depend on the strategy itself and its entry and exit parameters.
  • The accuracy of the data used in the simulation. No doubt you’ve head the modelling adage “Garbage in, garbage out.” If a simulation runs on poor data, obviously the accuracy of the results will deteriorate. Some of the vagaries of data include the presence of outliers or bad ticks, missing records, misaligned time stamps or wrong time zones and duplicates. Financial data can have its own unique set of issues too. For example, stock data may need to be adjusted for splits and dividends. Some data sets are contaminated with survivorship bias, containing only stocks that avoided bankruptcy and thus building in an upward bias in the aggregate price evolution. Over-the-counter products, like forex and CFDs, can trade at different prices at different times depending on the broker. Therefore a data set obtained from one source may not be representative of the trade history of another source. Again, the extent to which these issues are problems depends on the individual algorithm and its intended use.

 

In the next post, Kris will discuss Development Methodology.

 

 

 

Learn more about Robot Wealth here: https://robotwealth.com/

This article is from Robot Wealth and is being posted with Robot Wealth’s permission. The views expressed in this article are solely those of the author and/or Robot Wealth and IB is not endorsing or recommending any investment or trading discussed in the article. This material is for information only and is not and should not be construed as an offer to sell or the solicitation of an offer to buy any security. To the extent that this material discusses general market activity, industry or sector trends or other broad-based economic or political conditions, it should not be construed as research or investment advice. To the extent that it includes references to specific securities, commodities, currencies, or other instruments, those references do not constitute a recommendation by IB to buy, sell or hold such security. This material does not and is not intended to take into account the particular financial conditions, investment objectives or requirements of individual customers. Before acting on this material, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice.

 


19588




1 2 3 4 5 2 27

Disclosures

We appreciate your feedback. If you have any questions or comments about IBKR Quant Blog please contact ibkrquant@ibkr.com.

The material (including articles and commentary) provided on IBKR Quant Blog is offered for informational purposes only. The posted material is NOT a recommendation by Interactive Brokers (IB) that you or your clients should contract for the services of or invest with any of the independent advisors or hedge funds or others who may post on IBKR Quant Blog or invest with any advisors or hedge funds. The advisors, hedge funds and other analysts who may post on IBKR Quant Blog are independent of IB and IB does not make any representations or warranties concerning the past or future performance of these advisors, hedge funds and others or the accuracy of the information they provide. Interactive Brokers does not conduct a "suitability review" to make sure the trading of any advisor or hedge fund or other party is suitable for you.

Securities or other financial instruments mentioned in the material posted are not suitable for all investors. The material posted does not take into account your particular investment objectives, financial situations or needs and is not intended as a recommendation to you of any particular securities, financial instruments or strategies. Before making any investment or trade, you should consider whether it is suitable for your particular circumstances and, as necessary, seek professional advice. Past performance is no guarantee of future results.

Any information provided by third parties has been obtained from sources believed to be reliable and accurate; however, IB does not warrant its accuracy and assumes no responsibility for any errors or omissions.

Any information posted by employees of IB or an affiliated company is based upon information that is believed to be reliable. However, neither IB nor its affiliates warrant its completeness, accuracy or adequacy. IB does not make any representations or warranties concerning the past or future performance of any financial instrument. By posting material on IB Quant Blog, IB is not representing that any particular financial instrument or trading strategy is appropriate for you.