scikit learn - python logistic regression - patsy design matrix and categorical data -
quite new python , machine learning.
i trying build logistic regression model. have worked in r gain lambda , used cross-validation find best model , moving python.
here have created design matrix , made sparse. ran logistic regression. seems working.
my question is, since have stated term item_number category how know has become dummy variable? , how know coefficient goes each category name?
from patsy import dmatrices sklearn.linear_model import logisticregression sklearn import preprocessing def train_model (data, frm, rlambda): y, x = dmatrices(frm , data, return_type="matrix") y = np.ravel(y) scaler = sklearn.preprocessing.maxabsscaler(copy=false) x_trans = scaler.fit_transform(x) model = logisticregression(penalty ='l2', c=1/rlambda) model = model.fit(x_trans, y) frm = 'purchase ~ price + c(item_number)' rlambda = 0.01 model, train_score = train_model(data1,frm,rlambda)
first fix error code , answer question.
your code: train_model
function won't return think returns. currently, doesn't return anything, , want return both model , training score. when fit model, need define mean training score - model won't return default. let's return model trained.
so should update train_model
function follows:
def train_model (data, frm, rlambda): y, x = dmatrices(frm , data, return_type="matrix") y = np.ravel(y) scaler = sklearn.preprocessing.maxabsscaler(copy=false) x_trans = scaler.fit_transform(x) model = logisticregression(penalty ='l2', c=1/rlambda) # model.fit() operates in-place model.fit(x_trans, y) return model
now when want determine variables correspond to, model.coef_
returns coefficients in decision function, of size (n_classes, n_features)
. order of coefficients correspond order features passed .fit()
method. in case, x_trans
design matrix of size (n_samples, n_features)
, each of coefficients in model.coef_
correspond coefficients each of n_features
in x
in same order presented in x
.
Comments
Post a Comment