Highfrequency Data Analysis - Day1

Roh

2021/01/21

Note: 이 페이지는 시카고대학교 Per Mykland 교수님의 FINM 33170 and STAT 33910 수업 과제 7번의 1번 문제에 관한 내용입니다. 제가 작성하는 내용이 정답이 아님 (사실 정답을 모름) 을 미리 알려드립니다. 오류 및 문의사항은 으로 메일주시면 감사하겠습니다

R code 블럭과 Python code 블럭은 다음과 같이 색깔로 구분하겠습니다. 결과창은 동일하게 Ivory 색 블럭으로 표시됩니다.

# "이것은 R 코드 입니다."
# "이것은 Python 코드 입니다."

HW7

Question 1

Retrieve the bid and offer prices for Apple (AAPL) and Google (GOOG) for Dec 3, 2012. The data are from the TAQ database in WRDS

wd_win = "G:/공유 드라이브/Project_TBD/Stock_Data/Misc"
wd_mac = "/Volumes/GoogleDrive/공유 드라이브/Project_TBD/Stock_Data/Misc"
fn = list.files(path=wd_win, pattern = '.csv')
trades = read.csv(paste(wd_win,fn[2],sep='/'))
quotes = read.csv(paste(wd_win,fn[1],sep='/'))

dim(trades)
## [1] 70182     7
dim(quotes)
## [1] 495848      6
head(trades)
##       DATE      TIME_M SYM_ROOT SYM_SUFFIX SIZE  PRICE TR_CORR
## 1 20121203 4:00:01.360     AAPL         NA  100 589.50       0
## 2 20121203 4:04:09.043     AAPL         NA  145 589.50       0
## 3 20121203 4:04:31.208     AAPL         NA  300 589.32       0
## 4 20121203 4:12:19.971     AAPL         NA  100 590.21       0
## 5 20121203 4:31:10.157     AAPL         NA  100 589.90       0
## 6 20121203 4:53:04.991     AAPL         NA  100 589.34       0
head(quotes)
##       DATE      TIME_M SYM_ROOT SYM_SUFFIX    BID    ASK
## 1 20121203 4:00:01.084     AAPL         NA 589.50 589.60
## 2 20121203 4:02:11.715     AAPL         NA 589.50 589.95
## 3 20121203 4:04:03.310     AAPL         NA 589.50 589.73
## 4 20121203 4:04:09.043     AAPL         NA 588.25 589.95
## 5 20121203 4:04:09.048     AAPL         NA 588.25 589.50
## 6 20121203 4:04:21.689     AAPL         NA 588.25 589.38
trades2 = 
  trades[,c(2,6)] %>% 
  filter(PRICE != 0) %>%
  drop_na()

quotes2 = 
  quotes[,c(2,5,6)] %>%
  mutate(spread = ASK - BID) %>%
  filter(BID !=0 & ASK !=0) %>%
  filter(spread < 100) %>%
  drop_na()

Question 2

Use any program to construct signature plots for the realized volatility of the bid and the offer(i.e., ask), the midquote ((bid+offer)/2), as well as the transaction log prices. Also provide a plot for the realized co-volatility between bid and ask log prices.

Theoretical Concept

We focus on a single asset traded in a liquid financial market.We assume that the sample-path of the corresponding (latent)price process,\(\{S∗t,0≤t\}\), is continuous and determined by the stochastic differential equation(sde)

\[r_t = d\log\left(S_t^*\right) =\sigma_t dW_t\] where \(W_t\) denotes a standard Brownian motion, and the spot volatility process \(\sigma_t\) is predictable and has a continuous sample path. We assume that \(\sigma_t\) and \(W_t\) processes are uncorrelated and, for convenience, we refer to the unit time interval as a day.

Integrated Volatility [IV]

Primary interest centers on forecasting the (latent) integrated volatility over daily and longer inter-daily horizons. Specifically, we define the one-period integrated volatility,

\[IV_{t+1} = \int_{t}^{t+1} \sigma_\tau^2 d\tau\]

and, for \(m\) a positive integer, the corresponding multi-period measure,

\[IV_{t+1:t+m} = \sum_{j=1}^m IV_{t+j}\]

In this context, \(IV_t\) equals the quadratic return variation which, in turn, provides a natural measure of ex-post return variability.

Integrated volatility is not directly observable but, as highlighted by many papers, the corresponding RVs provide consistent estimate of \(IV_t\). The standard RV measure is simply introduced as below.

Realized Volatility [RV]

Realized volatility [RV] is a non-parametric ex-post estimate of the return variation. The most obvious realized volatility measure is the sum of finely-sampledsquared return realizations over a fixed time interval.

\[RV_t^*(h) = \sum_{i=1}^{1/h} r_{t-1+ih}^{*(h)2}\], where \(1/h\) is assumed to be a positive integer and \(r_{t}^{*(h)} = \log(S_t^*)-log(S_{t-s}^*)\). Formally, \(RV_{t}^{*}(h)\) is uniformly consistent for \(IV_t\) as \(h \rightarrow 0\), i.e., the intraday sampling frequency goes to infinity.

Signature Plot

A microstructure bias, if operative, will probably manifest itself as sampling frequency increases by distorting the average realised volatility. We construct a plot of average realised volatility against sampling frequency, which we call the “volatility signature plot” This helps to reveal the severity of microstructure bias as sampling frequency increases, and can be useful in guiding the selection of sampling frequency. We can also use the volatility signature plots to characterise different market microstructures.

Interestingly, it turns out that the volatility signature has the same form as the variance-time function, which has been extensively studied in finance. However, while there is no information in the volatility signature that is not also present in high-frequency return autocorrelations, the two are complements, not substitutes, as the information relevant for construction and interpretation of realised volatilities is more directly and transparently revealed in the volatility signature plot.

Take log

bid_series = log(quotes2$BID)
ask_series = log(quotes2$ASK)
mid_series = log((quotes2$BID + quotes2$ASK)/2)
price_series = log(trades2$PRICE)

Calculate Averaged Realized Volatility

ARV = function(x, y=x, k){
  #overlap delta(x_i)
  diff_x = diff(x, lag=k)
  diff_y = diff(y, lag=k)
  ARV = sum(diff_x * diff_y) / k
  if (ARV > 0){
    result = sqrt(ARV*250)
  }
  else {
    result = -sqrt(-ARV*250) 
  }
  return(result)
}

Get ARVs for bid,ask,mid, and trade

num_k = 100
step = 2
max_k = num_k * step
ARVs = data.frame(bid = c(1:num_k)*0,
                  ask = c(1:num_k)*0,
                  mid = c(1:num_k)*0,
                  trade = c(1:num_k)*0,
                  covol = c(1:num_k)*0
                  )

for (i in 1:num_k){
  ARVs[i,1] = ARV(bid_series, bid_series, step*i)
  ARVs[i,2] = ARV(ask_series, ask_series, step*i)
  ARVs[i,3] = ARV(mid_series, mid_series, step*i)
  ARVs[i,4] = ARV(price_series, price_series, step*i)
  # Realized Covariation of Bid and Ask
  ARVs[i,5] = ARV(bid_series, ask_series, step*i)
}

ARVs =
  ARVs %>%
  rowwise() %>%
  mutate(corr= covol/(bid*ask))
ARVs['k'] = seq(step,max_k,step)
#ARVs$ID = seq.int(nrow(ARVs))
head(ARVs)
## # A tibble: 6 x 7
## # Rowwise: 
##     bid   ask   mid trade  covol    corr     k
##   <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl> <dbl>
## 1  40.3  42.2  26.0 0.472 -18.9  -0.0111     2
## 2  28.8  30.0  18.4 0.364 -13.7  -0.0158     4
## 3  23.7  24.7  15.1 0.316 -11.4  -0.0194     6
## 4  20.5  21.5  13.1 0.290  -9.77 -0.0222     8
## 5  18.4  19.3  11.8 0.273  -8.91 -0.0250    10
## 6  16.9  17.7  10.8 0.260  -8.11 -0.0272    12

Plot

Although high-frequency microstructural effects will also be small for sampling intervals larger k, realised volatility estimates constructed from larger return intervals will begin to suffer from a higher sampling error. There is a trade-off between minimising microstructural bias and minimising sampling error

  • k가 작을때 = sampling frequency가 매우 큼= highly frequent
  • bid/ask RV가 가장 크다
  • trasaction price RV 보다 mid quote RV가 더 크다
  • sampling frequency가 클때 covariance도 매우 크다

Mystery

  • BID와 ASK return series는 같은 방향으로 가야하는거 아닌가?
  • MID보다 Transaction Price의 RV가 더 커야하는거 아닌가?

Clue

  • Spread Plot

  • Bid Plot

Question 3

For the same data, compute the statistic \(L_T^{(n,d)}\) from the previous problem set, as well as \(\widehat{QV}\). Recall that $ L_T^{(n,d)}/$ is asymptotically \(N(0,1)\) if the \(X\) process is an Ito process. On this basis, test the null hypothesis \(H_0\) that the \(X\) process is an Ito process.

\(L_T^{(n,d)}\)

\[L_{t}^{(n,d)}=\sum_{t_{i+1}\le t}\Delta X_{t_{i-1}}\Delta X_{t_{i}}\] where \(\Delta X_{t_{i-1}}=X_{t_{i}}-X_{t_{i-1}}\quad\Delta X_{t_{i}}=X_{t_{i+1}}-X_{t_{i}}\)

##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## -36.96480000  -0.00023600   0.00000000  -0.00296394   0.00000000   1.15793800

\(\widehat{QV}\)

Test

##        Min.     1st Qu.      Median        Mean     3rd Qu.        Max. 
## -6.12000000 -0.01000000  0.00000000 -0.00981701  0.00000000  3.97270000
##         Min.      1st Qu.       Median         Mean      3rd Qu.         Max. 
## -78.65013227  -0.00235544   0.12636432   0.00000000   0.12636432  51.26286564

  • This shows that we can reject the null hypothesis that the X process is an Ito process

Question 4

Determine a reasonable choice of J and K for using TSRV based on transaction log prices.


Source