The Goal

The main purpose of this package is to provide a C++ “framework” that can be used to implement complex technical stock factors, e.g., “WorldQuant 101 Alphas” and “GTJA 191 Alphas”(the Chinese name of this research paper is “基于短周期价量特征的多因子选股体系”), in an efficient, maintainable and correct way.

This package currently implements all the 191 alphas that documented in GTJA’s research papers. We plan to make the package extensible in the future so that the users can implement their own definitions easily, by taking advantage of the C++ framework.

The difficulty

Most of these technical factors are generated by the machine (by data mining) so they are often nested with multiple layers. For example, the formula of the “alpha 87” factor in “GTJA 191 Alphas” looks like this:

Alpha87: 
((RANK(DECAYLINEAR(DELTA(VWAP, 4), 7)) + 
TSRANK(DECAYLINEAR(((((LOW * 0.9) + 
(LOW * 0.1)) - VWAP) / (OPEN - ((HIGH + LOW) / 2))), 11), 7))
* -1)
Alpha160:
SMA((CLOSE<=DELAY(CLOSE,1)?STD(CLOSE,20):0),20,1)

As you can see, it’s complicated in ways that:

  • It uses not only the historical price of the individual stock but also the peer info given any time point
  • The formula are multiple-nested, the researcher is difficult to write the implementation code correctly
  • Some functions can’t be expressed directly in codes so you may have to implement a formula with bloated code thus error-prone
  • It’s difficult to know the historical length of the data requires for a given formula, making the optimizing and NA handling issue harder

What’s more, the efficiency of implementation is very important: as the effectiveness of the technical factors declines quickly, we need to have factor values in daily frequency. At the time of writing, there’re more than 3000 stocks in the A-share market. Even if we are able to perform 100 calculation per second. It takes 10.5 hours to have a five-year historical factor value for a single factor (5 * 252 * 3000 / 3600 / 100).

However, given the complexity of the formula, it’s easy to use future information by accident (very dangerous in Quant research) or write incorrect codes, without a framework, while it’s also difficult to implement efficiently, with one.

The solution

This package strives to provide a framework so that you can write those alpha formulas in an efficient, maintainable and correct way. The two of those alphas can be implemented with C++ codes like below.

The first one looks still complicated but if you check the code carefully, you can see that the code is very similar / close to the original formula. In addition, it avoids the manual management of the data handling thus prevents you from using future data accidentally. What’s more important, it runs fast, due to taking advantage of the zero-cost abstraction that C++ empowers(it takes less than 1 minute to calculate the two alphas of all A-share stocks for the past five year 201501 - 201912, on a regular PC using three cores).

The C++ code

Alpha_mfun alpha087 = [](const Quotes& qts) -> Timeseries {
  auto decay_linear1 = [](const Quote& qt) {
    return decaylinear(
      qt.ts<double>(7, [](const Quote& qt){ return delta(qt.ts_vwap(4)); })
    );
  };
  auto decay_linear2 = [](const Quote& qt) {
    auto part1 = qt.ts_low(11) * 0.9 + qt.ts_low(11) * 0.1 - qt.ts_vwap(11);
    auto part2 = qt.ts_open(11) - (qt.ts_high(11) + qt.ts_low(11) / 2);
    return decaylinear(part1 / part2);
  };
  auto ts_rank = [decay_linear2](const Quote& qt) {
    return tsrank(qt.ts<double>(7, decay_linear2));
  };
  Timeseries part1 = rank(qts.apply(decay_linear1));
  Timeseries part2 = qts.apply(ts_rank);
  return (part1 + part2) * -1.0;
};

Alpha_fun alpha160 = [](const Quote& qt) -> double {
  auto fun = [](const Quote& qt) {
    return (qt.close() <= qt.close(1)) ? stdev(qt.ts_close(20)) : 0.0;
  };
  return sma(qt.ts<double>(20, fun), 1);
};

Example

library(techfactor)
head(tf_quote)
#>          DATE PCLOSE  OPEN  HIGH   LOW CLOSE     VWAP   VOLUME     AMOUNT
#> 1: 2018-01-02  31.06 31.45 32.99 31.45 32.56 32.46114 68343350 2218502767
#> 2: 2018-01-03  32.56 32.50 33.78 32.23 32.33 32.93164 64687020 2130249691
#> 3: 2018-01-04  32.33 32.76 33.53 32.10 33.12 32.89830 52908580 1740602533
#> 4: 2018-01-05  33.12 32.98 35.88 32.80 34.76 34.59591 84310196 2916787872
#> 5: 2018-01-08  34.76 35.11 36.96 35.11 35.99 36.04448 83078359 2994515872
#> 6: 2018-01-09  35.99 35.63 36.11 34.95 35.84 35.55054 47845909 1700947894
#>    BMK_CLOSE BMK_OPEN
#> 1:  3405.275 3405.275
#> 2:  3429.864 3429.864
#> 3:  3442.373 3442.373
#> 4:  3446.696 3446.696
#> 5:  3459.510 3459.510
#> 6:  3470.250 3470.250
(from_to <- range(tail(tf_quote$DATE)))
#> [1] "2018-04-26" "2018-05-07"

factors <- tf_reg_factors()
str(factors)
#>  chr [1:191] "alpha001" "alpha002" "alpha003" "alpha004" "alpha005" ...
#>  - attr(*, "normal")= chr [1:128] "alpha002" "alpha003" "alpha004" "alpha005" ...
#>  - attr(*, "panel")= chr [1:63] "alpha001" "alpha006" "alpha007" "alpha008" ...
(normal_factor <- attr(factors, "normal")[1])
#> [1] "alpha002"
(panel_factor <- attr(factors, "panel")[1])
#> [1] "alpha001"

qt <- tf_quote_xptr(tf_quote)
tf_qt_cal(qt, normal_factor, from_to)
#>             alpha002
#> 2018-04-26  0.228474
#> 2018-04-27 -1.238390
#> 2018-05-02  1.376597
#> 2018-05-03 -1.302913
#> 2018-05-04  1.133333
#> 2018-05-07 -1.404219

head(tf_quotes[1])
#> $SZ300333
#>             DATE PCLOSE  OPEN  HIGH   LOW CLOSE     VWAP   VOLUME     AMOUNT
#>    1: 2014-01-02  18.41 18.25 19.47 18.18 19.42 19.09579  4973297   94969018
#>    2: 2014-01-03  19.42 19.26 19.63 18.95 19.14 19.24656  4644767   89395800
#>    3: 2014-01-06  19.14 19.09 19.14 18.11 18.23 18.53877  3764967   69797853
#>    4: 2014-01-07  18.23 18.20 18.90 18.00 18.88 18.53837  3661866   67885019
#>    5: 2014-01-08  18.88 18.96 19.75 18.88 19.42 19.42419  5951451  115602106
#>   ---                                                                       
#> 1050: 2018-04-23  10.35 10.21 11.39 10.09 11.39 10.95390 75754103  829803230
#> 1051: 2018-04-24  11.39 10.99 12.53 10.86 12.53 11.71113 76179024  892142570
#> 1052: 2018-04-25  12.53 13.30 13.78 13.30 13.78 13.63979 88624083 1208814170
#> 1053: 2018-04-26  13.78 13.46 13.81 12.65 12.71 13.14234 87762946 1153410391
#> 1054: 2018-04-27  12.71 13.00 13.00 11.81 12.11 12.28674 69980225  859828858
#>       BMK_CLOSE BMK_OPEN
#>    1:  1962.750 1962.750
#>    2:  1945.083 1945.083
#>    3:  1897.140 1897.140
#>    4:  1904.342 1904.342
#>    5:  1912.186 1912.186
#>   ---                   
#> 1050:  3115.012 3115.012
#> 1051:  3181.041 3181.041
#> 1052:  3180.202 3180.202
#> 1053:  3120.147 3120.147
#> 1054:  3123.152 3123.152
qts <- tf_quotes_xptr(tf_quotes)
tf_qts_cal(qts, normal_factor, from_to)
#>              SZ300333    SH601158  SZ002788   SH603101   SH600020   SH601668
#> 2018-04-26  1.8965517  0.08571429  1.564356  0.4707602  0.4444444  0.3563636
#> 2018-04-27 -0.4007534 -0.92500000 -1.666667 -1.0959596 -0.7777778 -0.3200000
#>              SH600615 SZ002721  SZ300517   SH601567   SH603477   SZ002297
#> 2018-04-26  0.8758170       NA 0.5395764 -0.6153846  0.4538462  1.0396341
#> 2018-04-27 -0.3572985       NA 0.3940621 -0.2797203 -0.6760684 -0.6146341
#>              SH600537  SH603906   SH603183   SZ002884  SZ300531  SZ002641
#> 2018-04-26  1.0476190  1.604159  1.0332307  0.4903226  1.494949  0.800000
#> 2018-04-27 -0.7142857 -1.049057 -0.1057424 -2.0000000 -1.122807 -1.666667
#>              SZ002851   SH600719
#> 2018-04-26  0.4403752 -0.7593583
#> 2018-04-27 -0.5645370  0.7930283
tf_qts_cal(qts, panel_factor, from_to)
#>               SZ300333   SH601158   SZ002788  SH603101   SH600020   SH601668
#> 2018-04-26  0.04664214 -0.6441926 -0.5135616 0.3013375 -0.9032789 -0.6186114
#> 2018-04-27 -0.28739263 -0.5270030 -0.4062324 0.6185499 -0.8719299 -0.5925445
#>             SH600615 SZ002721   SZ300517   SH601567    SH603477    SZ002297
#> 2018-04-26 0.1917762       NA -0.6290566 -0.8998849 -0.06515103 -0.07402860
#> 2018-04-27 0.5320094       NA -0.7884800 -0.9069789 -0.05118596 -0.06976503
#>                SH600537   SH603906   SH603183   SZ002884   SZ300531   SZ002641
#> 2018-04-26 -0.296330051 -0.8887960 -0.2486362 -0.5629602 -0.6881489 -0.4926093
#> 2018-04-27 -0.009231862 -0.3809784 -0.2365224  0.4326827 -0.4215067  0.1503632
#>              SZ002851   SH600719
#> 2018-04-26         NA 0.10130851
#> 2018-04-27 -0.6780386 0.05142658

session info

xfun::session_info(packages = 'techfactor')
#> R version 3.6.2 (2019-12-12)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Catalina 10.15.3
#> 
#> Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
#> 
#> Package version:
#>   anytime_0.3.7     BH_1.72.0.3       data.table_1.12.9 graphics_3.6.2   
#>   grDevices_3.6.2   grid_3.6.2        lattice_0.20.38   magrittr_1.5     
#>   methods_3.6.2     Rcpp_1.0.4.5      stats_3.6.2       techfactor_0.2.0 
#>   utils_3.6.2       xts_0.12.0        zoo_1.8.7