项目描述
因为本次竞赛必须要报名才能拿到题目,所以我将Kaggle的原题复制在此处:
High Frequency Futures Market Forecasting
Overview
In this competition, participants will create a model based on real futures tick data sourced from the actual market, providing invaluable insight into the everyday obstacles faced in effective trading strategies. This event not only challenges competitors to navigate the intricacies of financial data but also emphasizes the complexities involved in financial market modeling. Participants will encounter issues like heavy-tailed distributions, which can lead to unexpected market behavior, dynamic time series that require adaptive algorithms, and the sudden shifts in market trends that can disrupt even the best-laid plans. By engaging with these real-world challenges, competitors will enhance their understanding of market dynamics and improve their modeling skills.
Description
In modern finance, navigating the complexities of market fluctuations is a fundamental challenge for investors. Regardless of strategy, volatility is an inherent part of this landscape, forcing professional investors to estimate their potential returns while facing varying risks associated with different asset classes. The unpredictability of the market, characterized by heavy-tailed distributions and non-stationary time series, further complicates these predictions. As financial data science evolves, innovative algorithms and models are being developed to enhance the forecasting capabilities of quantitative analysts.
In this competition, participants will be tasked with creating a model that predicts the return rate of investments using real market tick data. By training and testing their algorithms on historical pricing data, competitors will confront the practical challenges of financial modeling. The top submissions will not only aim for accuracy but also demonstrate creative problem-solving skills in addressing this intricate data science issue.
Success in this endeavor could significantly enhance the ability of quantitative researchers to make informed predictions, ultimately empowering investors of all sizes to make better decisions. Moreover, participants may discover a talent for analyzing financial datasets, potentially opening doors to exciting opportunities across various sectors.
Evaluation
We have eight prediction targets, and we will calculate the Root Mean Square Error (RMSE) for the predicted values and the actual values for each column. The final evaluation metric will be the Mean Columnwise Root Mean Squared Error across the eight columns.
Submission File
For each ID in the test set, you must predict market returns [Y0-Y7 variables]. The file should contain a header and have the following format:
ID,Y0,Y1,Y2,Y3,Y4,Y5,Y6,Y7
2, 0, 0, 0, 0, 0, 0, 0, 0
6, 0, 0, 0, 0, 0, 0, 0, 0
9, 0, 0, 0, 0, 0, 0, 0, 0
etc.
省流
简单来说,我们要通过训练集找出数据潜在的规律,并给出预测。训练集的数据规模为[393087 rows x 568 columns],我推测每一行代表一个产品,每一列是该产品的某个特征(第一列为ID,可忽略;后559列为X,最后8列为Y),我们需要通过X去预测Y。主要的难点在于,我们并不知道数据的经济学意义,所以需要自行分析,并根据数据的特点设计相应的策略。