Anyway this function takes a table (of N columns) you input in it and makes all the possible combinations of (M, M<N) columns, then uses another function called LDA (not made by me) and uses it an all the combinations made, then it looks for the combination that produces a certain desired output and provides me the combinations that did it.
Look:
ldaMaxSeparation <- function(dataframe, variables, groupvariable, combinationnumber)
{
#load variables as df
variables = as.data.frame(variables)
# find the variable names
variablenames <- colnames(variables)
#create all variable combinations
variablecombinations = combn(variablenames,combinationnumber,simplify=FALSE)
# find out how many combinations we have
numcombinations = length(variablecombinations)
#provide lda function
library(MASS)
#create vector to load output elements
separation=dim(numcombinations)
#create df for each combinatons and do LDA on it
for (i in 1:numcombinations)
{
dataframei=cbind(groupvariable, dataframe[,variablecombinations[[i]]])
#do LDA for all i dataframes
ldai= lda(groupvariable~., data=dataframei, prior = (1,1,1)/3) separation[i]=sum((ldai$svd)^2)
}
maxseparation=max(separation)
indexmaxsep=which(separation==max(separation))
bestcomb=variablecombinations[[indexmaxsep]]
print( maxseparation)
print(bestcomb)
}
What that benefits me in reality is that I can use just 5 columns and get what I want - when I was trying manually I could go down to 9 (less is better).
Why would I want to do that?
Every column represents the content of a certain compound in a plant. I have 3 very closely related plant species, and now I can identify them with a 1% error rate by quantifying just 5 compounds. Not LIGO league I know but I live in a 3rd world country xd.