# Logistic Regression used in Predictive Modeling

In the telecommunications industry, customers can choose from multiple service providers
and actively switch from one provider to another. Customer “churn” is defined as the
percentage of customers who stopped using a provider’s product or service during a certain
time frame. In this highly competitive market, some telecommunications industries can
experience average annual churn rates as high as 25 percent. Given that it costs 10 times
more to acquire a new customer than to retain an existing one, customer retention has now
become even more important than customer acquisition.
For many providers, retaining highly profitable customers is the number one business goal.
To reduce customer churn, telecommunications companies need to predict which customers
are at high risk of churn.
You are an analyst on a team of analysts in a popular telecommunications company, which
serves customers in all regions of the United States. You have been asked to analyze the
data set to explore the data, identify trends, and compare key metrics.

## Is logistic regression predictive or descriptive?

Part I: Research Question
A. Describe the purpose of this data analysis by doing the following:

1. Summarize one research question that is relevant to a real-world organizational situation captured in
the data set you have selected and that you will answer using logistic regression.
2. Define the objectives or goals of the data analysis. Ensure that your objectives or goals are reasonable
within the scope of the data dictionary and are represented in the available data.
Part II: Method Justification
3. Describe logistic regression methods by doing the following:
4. Summarize the assumptions of a logistic regression model.
5. Describe the benefits of using the tool(s) you have chosen (i.e., Python, R, or both) in support of various
phases of the analysis.
6. Explain why logistic regression is an appropriate technique to analyze the research question
summarized in Part I.
Part Ill: Data Preparation
C. Summarize the data preparation process for logistic regression by doing the following:
7. Describe your data preparation goals and the data manipulations that will be used to achieve the goals.
8. Discuss the summary statistics, including the target variable and all predictor variables that you will
need to gather from the data set to answer the research question.
9. Explain the steps used to prepare the data for the analysis, including the annotated code.
10. Generate univariate and bivariate visualizations of the distributions of variables in the cleaned data set.
Include the target variable in your bivariate visualizations.
11. Provide a copy of the prepared data set.
Part IV: Model Comparison and Analysis
12.

13. D. Compare an initial and a reduced logistic regression model by doing the following:
14. Construct an initial logistic regression model from al/predictors that were identified in Part C2
15. Justify a statistically based variable selection procedure and a model evaluation metric to reduce the
initial model in a way that aligns with the research question.
16. Provide a reduced logistic regression model.
Note: The output should include a screenshot of each model.
E. Analyze the data set using your reduced logistic regression model by doing the following:
17. Explain your data analysis process by comparing the initial and reduced logistic regression models,
including the following elements:
• the logic of the variable selection technique
• the model evaluation metric
18. Provide the output and anycalculations of the analysis you performed, including a confusion matrix.
Note: The output should include the predictions from the refined model you used to perform the
analysis.
19. Provide the code used to support the implementation of the logistic regression models.
Part V: Data Summary and Implications
F. Summarize your findings and assumptions by doing the following:
20. Discuss the results of your data analysis, including the following elements:
• a regression equation for the reduced model
• an interpretation of coefficients of the statistically significant variables of the model
• the statistical and practical significance of the model
• the limitations of the data analysis
21. Recommend a course of action based on your results.
Part VI: Demonstration
G. Provide a Panopto video recording that includes al/of the following elements:
• a demonstration of the functionality of the code used for the analysis
• an identification of the version of the programming environment
• a comparison of the two logistic regression models you used in your analysis
• an interpretation of the coefficients 