Skip to content

Data Study

Unveiling the Power of Data: Enhancing Predictive Insights

In our pursuit of refined predictive insights, we delve into the intricate world of Japanese candlestick data. Our Data Study section embarks on a journey to unravel the nuances that lie within these candlesticks, seeking to discern the critical variables that elevate forecasting accuracy.

Exploring Beyond Traditional Metrics

Beyond the realms of conventional price metrics (close, open, high, low) and volumetric data, lies a realm brimming with potential – the integration of external sentiment indicators. We delve into indices of fear and greed, volatility shifts, and other sentiment markers. Our hypothesis asserts that fusing these external indicators with traditional data will pave the way for enhanced prediction models.

Data Fusion: A Leap Forward

We embark on meticulous research to identify the optimal blend of variables, enriching our training datasets. This multi-dimensional approach supersedes solely relying on autoregressive data. Our pursuit of innovation transcends conventional methodologies, incorporating a wide spectrum of external indicators that capture the pulse of market sentiment.

Temporal Insights: The Key to Precision

Our inquiry extends to explore the temporal dimension of data training. Can the time frame of training models wield a significant impact on accuracy? We investigate the disparities between models trained on historical data spanning 1988 to 2012 for daily candlesticks and those trained on post-2012 data. The unique market dynamics within these periods catalyze our quest to unveil the temporal factors influencing predictive success.

Quantifying Uniqueness: A Tale of Two Models

Intriguingly, certain market epochs exhibited distinct behaviours, leading us to contemplate the impact of varied data ranges. We propose two distinct models for evaluation: one encompassing data from 1988 to 2022, and the other focusing on data between 2012 to 2022. This comparative exploration strives to unveil the optimal data subset for fostering robust prediction accuracy.

In our relentless pursuit of precision, TerraBot emerges as a catalyst of innovation, fusing data science with financial foresight. Our Data Study journeys into the intricate tapestry of data, unravelling insights that redefine the very fabric of predictive modelling.

Our study in a nutshell

We generally used two types of data:

  1. Raw data:
    • price (close, open, high, low)
    • volume
  2. Derivatives of the raw data (complex indicators and custom signals):
    • RSI, EMA, TEMA, CCI, ARRON_UP, ADX, APO, TRIX, CMO, TMA and many others.
    • Signals like: “kst_crossed_signal”, “price_crossed_keltner”, “price_crossed_lagf” (and much more).

Checking the applicability of external sentiment data:

  • The Fear and Greed Index – We have tested this index for the Crypto market and it did not show any positive improvements. We consider using this test in the future to check the impact on 1d candles.
  • Earnings reports – We have also tested the earnings reports but they have no significant impact on the 1h strategy.
  • Volatility index – We calculated our internal vox indicators (additionally, we have checked the Earnings reports, see above).

Checking the acceptance of the model training period:

There is a difference between using stock prices from 1988 to 2010 for daily candles as a test sample and after 2010. For the models that were tested, it was not justified to train them on the data before 2010 on 1h candles, because it did not change the efficiency, but took much more time. Moreover, it was always required to use the latest data for the models to work correctly.

Model comparison results:

  • Data from 1988 to 2022: f1_score = 63%
  • Data from 2010 to 2022: f1_score = 64%

Results of testing other subsets of data:

  • Data from 2000 to 2022: f1_score = 66%
  • Data from 2010 to 2022: f1_score = 64%

We have decided to focus on the 2010-2022 period to train data.