lasso_model <- cv. x_train <- model.matrix(as.formula(model_string), df)įit your model. Don't forget the as.formula to coerce character to formula. create_factor %Ĭreate dummy variables with model.matrix. You can make dummy variables from your factor using model.matrix. You can play around with values for lambda (a tuning parameter) or just leave the option blank and the function will pick a range for you. So since we didn't specify a relationship between our factors (var1, var2, etc.) and y, the LASSO does a good job and sets all coefficients to 0 except when the minimum amount of regularization is applied. , dplyr::select(df, -y))įit <- gglasso(x = x, y = y, group = groups, lambda = 1) Building on Flo.P's code above: install.packages("gglasso")Ĭreate_factor <- function(nb_lvl, n= 100 )ĭf <- ame(var1 = create_factor(5), There are ways to deal with this, but rather than cludge something together, I'd try the group lasso. You may also find that if the ordering of your factor levels changes, you end up with a different model. And maybe none of the other dummies for that factor were selected. (I remember learning it as "reference coding" see here for a summary.) That means that if one of these dummies is included, your model now has a parameter whose interpretation is "the difference between one level of this factor and an arbitrarily chosen other level of that factor". In most cases, transcriptome mapping (i.e. model.matrix uses what is referred to as "dummy coding". There are two ways you can do RNA-Seq processing: 1. The default coding that model.matrix has very specific interpretations when taken by themselves. However, if you're interested in interpreting your model or discussing which factors are important after the fact, you're in a weird spot. LASSO will find you a useful set of variables, and you probably won't be over-fit. If all you care about is prediction, then this is probably fine, and the approach provided by Flo.P should be okay. Description Create a matrix Usage modMATRIX ( x, use TRUE, block FALSE, correlation FALSE, digits -1, context 'matlist'. Depending on your application, it may not be a great solution. The other answers here point out ways to re-code your categorical factors as dummies.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |