【发布时间】:2020-08-02 09:14:02
【问题描述】:
我目前正在使用 R 来回测一些 Football/Soccer 赔率,并使用一个模型来创建我自己的赔率。
目前这是一个非常漫长的过程,我很好奇是否有一个循环/函数可以帮助加快这个过程。
这段代码收集了整个赛季的结果。
library(dplyr)
library(rvest)
library(tidyverse)
options(max.print = 9999)
Res <- read_html("https://www.betexplorer.com/soccer/england/premier-league/results/?month=all")
tbls_ls <- Res %>%
html_nodes("table") %>%
.[1] %>%
html_table(fill = TRUE)
Results <- as.data.frame(tbls_ls)
Results <- Results[,c(1:2)]
names(Results) <- c("Fixture","Score")
Results <- tidyr::separate(Results, Fixture, into =c("HomeTeam","AwayTeam"), sep = " - ")
Results <- tidyr::separate(Results, Score, into = c("FTHG","FTAG"), sep = ":")
Results <- Results %>% tidyr::drop_na()
Results <- Results[,c(1:4)]
write.csv(Results, file = "Results.csv")
rownames(Results) <- 1:nrow(Results)
我正在按比赛周回测赔率,对于我正在测试的联赛,我每场比赛每周有 10 场比赛。这段代码删除了前一周的比赛,并设置了那一周的赛程,就好像他们还没有比赛一样。这将删除第 29 场比赛(本联盟的最后一场比赛)
ResultsEdit <- Results #[-(1:10),]
FixEdit <- ResultsEdit[,c(1:2)]
ResultsEditE <- Results [-(1:10),]
ResultsEditE %>% tidyr::drop_na()
write.csv(Results, file="ResultsEditE")
如果我想删除第 29 和 28 场比赛并使用第 28 场比赛作为尚未比赛的赛程,我会编辑代码以
ResultsEdit <- Results [-(1:10),]
ResultsEditE <- Results [-(1:20),]
等我再回去。
这是预测赔率的泊松码
library("vcd")
source("http://www.maths.leeds.ac.uk/~voss/projects/2010-sports/Football.R")
results0 <- read.csv("ResultsEditE",stringsAsFactors = F)
results0$X <- NULL
countres <- results0$FTHG + results0$FTAG
tg <- countres
fretabtg<-table(tg)
gf <- goodfit(fretabtg, type="poisson", method="ML")
Table0 <- Table(results0)
games <- results0
g <- nrow(games)
Y <- matrix(0,2*g,1)
for (i in 1:g) {
Y[((2*i)-1)] <- games[i,3]
Y[(2*i)] <- games[i,4]
}
teams <- sort(unique(c(games[,1], games[,2])), decreasing = FALSE)
n <- length(teams)
X <- matrix(0,2*g,((2*n)+1))
for (i in 1:g) {
M <- which(teams == games[i,1])
N <- which(teams == games[i,2])
X[((2*i)-1),M] <- 1
X[((2*i)-1),N+n] <- -1
X[(2*i),N] <- 1
X[(2*i),M+n] <- -1
X[((2*i)-1),((2*n)+1)] <- 1
}
x <- qr(X)
x$rank
XX <- X[,-1]
TeamParameters <- Parameters(results0)
SimSeason <- Games(TeamParameters)
SimSeason <- SimSeason %>% tidyr::drop_na()
SimTable <- Table(SimSeason)
Simulations <- Sim(TeamParameters,3)
Probabilities <- ProbTable(TeamParameters,"", "")
ResultProbabilities<- ResultProbs(Probabilities)
cat("\nHome Win True Odds:", 100/ResultProbabilities$HomeWin)
cat("\nDraw True Odds:", 100/ResultProbabilities$Draw)
cat("\nAway Win True Odds:", 100/ResultProbabilities$AwayWin)
这段代码给了我想要的比赛周的赔率。
run_probs <- function(h_team, a_team) {
Probabilities <- ProbTable(TeamParameters, h_team, a_team)
ResultProbabilities <- ResultProbs(Probabilities)
cat(paste("\n", h_team, "VS", a_team))
cat("\nHome Win:", 100/ResultProbabilities$HomeWin)
cat("\nDraw:", 100/ResultProbabilities$Draw)
cat("\nAway Win:", 100/ResultProbabilities$AwayWin)
return(ResultProbabilities)
}
FixEdit <- head(FixEdit, n=10)
prob_list <- Map(run_probs, FixEdit$HomeTeam,FixEdit$AwayTeam)
我迫切想做的是减少我度过一个赛季所需的时间。 以我提供的代码为例,是否可以为此执行某种循环?
Run the game week 29 removal code, run the poisson code, run the code for giving me the odds for the game week - save the results in a CSV
Run the game week 28 removal code, run the poisson code, run the code for giving me the odds for the game week - save the results in a CSV
等等等等
希望每个游戏周都能返回类似的内容。
Home Away Home Win Draw Away Win
1 Leicester Aston Villa 1.209044 9.009009 16.18123
2 Chelsea Everton 1.634788 5.09165 5.216484
3 Manchester Utd Manchester City 3.125 4.199916 2.265006
4 Arsenal West Ham 1.786352 4.52284 4.56621
5 Burnley Tottenham 3.08642 3.904725 2.379819
6 Crystal Palace Watford 2.309469 3.079766 4.128819
7 Liverpool Bournemouth 1.160362 10.04016 25.97403
8 Sheffield Utd Norwich 1.637465 3.868472 7.639419
9 Southampton Newcastle 2.198769 3.687316 3.654971
10 Wolves Brighton 1.785714 4.016064 5.230126
对不起,如果我没有任何意义,对不起。如果它看起来像胡言乱语,请随意锁定/删除帖子。
【问题讨论】:
-
ScoutingForJay,请花点时间尊重minimal reproducible example 中的“M”。这里有很多代码(对我们来说)完全没有任何作用,没有持久的影响(只有副作用或内省),只会让你的问题变得模糊。示例:未保存的对
glm、mean、var、table的调用,以及用于您查看某些内容的任何代码行,但由于此处未显示我只能推断我们不需要看那行代码。冗长的问题可能会产生威慑作用,请考虑将演示问题所需的可重现代码缩短到最少。 -
我刚刚浏览了您的代码,但是由于您已经在使用
tidyverse,您肯定可以通过开始使用管道 (%>%) 运算符来使其更短且更具可读性:@ 987654322@