{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "#Import Necessary Libraries\n", "\n", "# NumPy: For mathematical funcations, array, matrices operations\n", "import numpy as np \n", "\n", "# Graph: Plotting graphs and other visula tools\n", "import pandas as pd\n", "import seaborn as sns\n", "\n", "sns.set_palette(\"muted\")\n", "\n", "# color_palette = sns.color_palette()\n", "# To enable inline plotting graphs\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Columns ['ID', 'Age', 'Experience', 'Income', 'ZIP Code', 'Family', 'CCAvg', 'Education', 'Mortgage', 'Personal Loan', 'Securities Account', 'CD Account', 'Online', 'CreditCard']\n", "***********************************************************************************************************************\n", "Columns Map {0: 'ID', 1: 'Age', 2: 'Experience', 3: 'Income', 4: 'ZIP Code', 5: 'Family', 6: 'CCAvg', 7: 'Education', 8: 'Mortgage', 9: 'Personal Loan', 10: 'Securities Account', 11: 'CD Account', 12: 'Online', 13: 'CreditCard'}\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
IDAgeExperienceIncomeZIP CodeFamilyCCAvgEducationMortgagePersonal LoanSecurities AccountCD AccountOnlineCreditCard
01251499110741.61001000
124519349008931.51001000
233915119472011.01000000
343591009411212.72000000
45358459133041.02000001
\n", "
" ], "text/plain": [ " ID Age Experience Income ZIP Code Family CCAvg Education Mortgage \\\n", "0 1 25 1 49 91107 4 1.6 1 0 \n", "1 2 45 19 34 90089 3 1.5 1 0 \n", "2 3 39 15 11 94720 1 1.0 1 0 \n", "3 4 35 9 100 94112 1 2.7 2 0 \n", "4 5 35 8 45 91330 4 1.0 2 0 \n", "\n", " Personal Loan Securities Account CD Account Online CreditCard \n", "0 0 1 0 0 0 \n", "1 0 1 0 0 0 \n", "2 0 0 0 0 0 \n", "3 0 0 0 0 0 \n", "4 0 0 0 0 1 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Import CSV data using pandas data frame\n", "df_original = pd.read_csv('bank.csv')\n", "\n", "# Prepare columns names\n", "df_columns = []\n", "for column in df_original.columns:\n", " df_columns.append(column)\n", "\n", "# Prepare mapping of column names for quick access\n", "df_columns_map = {}\n", "map_index: int = 0\n", "for column in df_columns:\n", " df_columns_map[map_index] = column\n", " map_index = map_index + 1\n", " \n", "print(\"Columns {}\".format(df_columns))\n", "print(\"***********************************************************************************************************************\")\n", "print(\"Columns Map {}\".format(df_columns_map))\n", "\n", "# We have separated out columns and its mapping from data, at any point of time during data analysis or cleaning we \n", "# can directly refer or get data from either index or column identifier\n", "\n", "# See data overview\n", "\n", "df_original.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 5000 entries, 0 to 4999\n", "Data columns (total 14 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 ID 5000 non-null int64 \n", " 1 Age 5000 non-null int64 \n", " 2 Experience 5000 non-null int64 \n", " 3 Income 5000 non-null int64 \n", " 4 ZIP Code 5000 non-null int64 \n", " 5 Family 5000 non-null int64 \n", " 6 CCAvg 5000 non-null float64\n", " 7 Education 5000 non-null int64 \n", " 8 Mortgage 5000 non-null int64 \n", " 9 Personal Loan 5000 non-null int64 \n", " 10 Securities Account 5000 non-null int64 \n", " 11 CD Account 5000 non-null int64 \n", " 12 Online 5000 non-null int64 \n", " 13 CreditCard 5000 non-null int64 \n", "dtypes: float64(1), int64(13)\n", "memory usage: 547.0 KB\n" ] } ], "source": [ "# Lets analyse data based on following conditions\n", "# 1. Check whether all rows x colums are loaded as given in question, all data must match before we start to even operate on it.\n", "# 2. Print shape of the data\n", "# 8. Check data types of each field\n", "# 3. Find presence of null or missing values.\n", "# 4. Visually inspect data and check presense of Outliers if there are any and see are \n", "# they enough to drop or need to consider during model building\n", "# 5. Print shape of the data\n", "# 6. Do we need to consider all data columns given in data set for model building\n", "# 7. Find Corr, median, mean, std deviation, min, max for columns.\n", "\n", "# Below is info for our data\n", "\n", "df_original.info();" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
ID5000.02500.5000001443.5200031.01250.752500.53750.255000.0
Age5000.045.33840011.46316623.035.0045.055.0067.0
Experience5000.020.10460011.467954-3.010.0020.030.0043.0
Income5000.073.77420046.0337298.039.0064.098.00224.0
ZIP Code5000.093152.5030002121.8521979307.091911.0093437.094608.0096651.0
Family5000.02.3964001.1476631.01.002.03.004.0
CCAvg5000.01.9379381.7476590.00.701.52.5010.0
Education5000.01.8810000.8398691.01.002.03.003.0
Mortgage5000.056.498800101.7138020.00.000.0101.00635.0
Personal Loan5000.00.0960000.2946210.00.000.00.001.0
Securities Account5000.00.1044000.3058090.00.000.00.001.0
CD Account5000.00.0604000.2382500.00.000.00.001.0
Online5000.00.5968000.4905890.00.001.01.001.0
CreditCard5000.00.2940000.4556370.00.000.01.001.0
\n", "
" ], "text/plain": [ " count mean std min 25% \\\n", "ID 5000.0 2500.500000 1443.520003 1.0 1250.75 \n", "Age 5000.0 45.338400 11.463166 23.0 35.00 \n", "Experience 5000.0 20.104600 11.467954 -3.0 10.00 \n", "Income 5000.0 73.774200 46.033729 8.0 39.00 \n", "ZIP Code 5000.0 93152.503000 2121.852197 9307.0 91911.00 \n", "Family 5000.0 2.396400 1.147663 1.0 1.00 \n", "CCAvg 5000.0 1.937938 1.747659 0.0 0.70 \n", "Education 5000.0 1.881000 0.839869 1.0 1.00 \n", "Mortgage 5000.0 56.498800 101.713802 0.0 0.00 \n", "Personal Loan 5000.0 0.096000 0.294621 0.0 0.00 \n", "Securities Account 5000.0 0.104400 0.305809 0.0 0.00 \n", "CD Account 5000.0 0.060400 0.238250 0.0 0.00 \n", "Online 5000.0 0.596800 0.490589 0.0 0.00 \n", "CreditCard 5000.0 0.294000 0.455637 0.0 0.00 \n", "\n", " 50% 75% max \n", "ID 2500.5 3750.25 5000.0 \n", "Age 45.0 55.00 67.0 \n", "Experience 20.0 30.00 43.0 \n", "Income 64.0 98.00 224.0 \n", "ZIP Code 93437.0 94608.00 96651.0 \n", "Family 2.0 3.00 4.0 \n", "CCAvg 1.5 2.50 10.0 \n", "Education 2.0 3.00 3.0 \n", "Mortgage 0.0 101.00 635.0 \n", "Personal Loan 0.0 0.00 1.0 \n", "Securities Account 0.0 0.00 1.0 \n", "CD Account 0.0 0.00 1.0 \n", "Online 1.0 1.00 1.0 \n", "CreditCard 0.0 1.00 1.0 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 1. Check whether all rows x colums are loaded as given in question, all data must match before we start to even operate on it.\n", "\n", "#df_original.describe() difficult to view hence lets apply transpose() to visually see it better\n", "\n", "df_original.describe().transpose()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Analysis Column Wise\n", "\n", "* ID: seems just a identity representation of row or an item in a data frame, this can be dropped when processing model.\n", "* Age: based on std, q1, q2, q3 seems valid values.\n", "* **Experience**: Look at min, it say *-3 experience* cannot be in negative and this particular needs correction. Ideal values should be 0-80 considering a person started to work at 20 years and lives for max 100 years.\n", "* ZipCode: All values seems fine. If we wish to discard region from the model still our model is not impacted if we drop this column. But at later stage we are predicting, people from which area are accepting more personal loans, then we may need to consider this field mandatorily.\n", "* Family: Data looks ok and can play role like if children are less, then less responsibility and hence no need of loan, but more kids and then sometime people tend to go for extra loan apart from education as well. So this is very important field in model buulding.\n", "* CreditCard: General human assumption i would do that, person who has creditcard is vrey unlikely to go for personal load. But if the need arises for a longer term some data points might be there who has credit card as well as personal loan.\n", "\n", "\n", "Skipping some other fields which are self explanatory.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(5000, 14)" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# From our given data set we have succesfully loaded all columns looking at the column labels\n", "# Lets check shape of the data\n", "df_original.shape\n", "\n", "# Here we see total 5000 rows and 14 colums. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# sns.boxplot(y=\"Age\", orient=\"v\", x=\"Personal Loan\", hue=\"Education\", data=df_original)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "ID int64\n", "Age int64\n", "Experience int64\n", "Income int64\n", "ZIP Code int64\n", "Family int64\n", "CCAvg float64\n", "Education int64\n", "Mortgage int64\n", "Personal Loan int64\n", "Securities Account int64\n", "CD Account int64\n", "Online int64\n", "CreditCard int64\n", "dtype: object" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Data types of fields\n", "\n", "df_original.dtypes\n", "\n", "# We see that everything is numeric data and need not need any conversion" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Presence of null values or missing values\n", "df_original.isnull().values.any()\n", "\n", "# This tell us that we have for each row x column.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check validity for data, as we have seen, there is no missing data in out data frame, but are the values \n", "valid enough like we have seen that experience field has -3 having experience as -3 doesnt add any value to our model\n", "but it may impact our final consideration, this also depends on how many such values are present.\n", "Lets print column colmposition or categories spread of data for suspicious column.\n", "\n", "\n", " " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Experience has unique data in this range [1, 19, 15, 9, 8, 13, 27, 24, 10, 39, 5, 23, 32, 41, 30, 14, 18, 21, 28, 31, 11, 16, 20, 35, 6, 25, 7, 12, 26, 37, 17, 2, 36, 29, 3, 22, -1, 34, 0, 38, 40, 33, 4, -2, 42, -3, 43]\n" ] }, { "data": { "text/plain": [ "-1 33\n", "-2 15\n", "-3 4\n", "Name: Experience, dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Lets see what are experience range in our data set we have.\n", "\n", "print(\"Experience has unique data in this range {}\".format(df_original[\"Experience\"].unique().tolist()))\n", "\n", "# len(df_original[df_original[\"Experience\"] < 0]['Experience'].unique().tolist())\n", "df_experience = df_original[df_original['Experience'] < 0]\n", "df_experience['Experience'].value_counts()\n", "\n", "# So there are 52 values whose experience is a missing value which constitute ~1 % of data for experience column\n", "# We have few options to deal with this\n", "# - purge this invalid data\n", "# - Replace it with meaning full" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Age experience has unique data in this range [25, 45, 39, 35, 37, 53, 50, 34, 65, 29, 48, 59, 67, 60, 38, 42, 46, 55, 56, 57, 44, 36, 43, 40, 30, 31, 51, 32, 61, 41, 28, 49, 47, 62, 58, 54, 33, 27, 66, 24, 52, 26, 64, 63, 23]\n" ] }, { "data": { "text/plain": [ "35 151\n", "43 149\n", "52 145\n", "58 143\n", "54 143\n", "50 138\n", "41 136\n", "30 136\n", "56 135\n", "34 134\n", "39 133\n", "59 132\n", "57 132\n", "51 129\n", "60 127\n", "45 127\n", "46 127\n", "42 126\n", "40 125\n", "31 125\n", "55 125\n", "62 123\n", "29 123\n", "61 122\n", "44 121\n", "32 120\n", "33 120\n", "48 118\n", "38 115\n", "49 115\n", "47 113\n", "53 112\n", "63 108\n", "36 107\n", "37 106\n", "28 103\n", "27 91\n", "65 80\n", "64 78\n", "26 78\n", "25 53\n", "24 28\n", "66 24\n", "23 12\n", "67 12\n", "Name: Age, dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Lets quickly check Age\n", "\n", "print(\"Age experience has unique data in this range {}\".format(df_original['Age'].unique().tolist()))\n", "# This looks ok.\n", "\n", "#Values count\n", "df_original['Age'].value_counts()\n", "\n", "# There are no suspicious values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are able to view visually since values are in thousands range what if it is lakhs. It is not feasible of even a good approach to manually check terminal and see if we have invalid values. In that case we can try to parse the values in fields.\n", "Create a helper method for string, bool, int and other data types. Signature might look like this:\n", "\n", "\n", "```python\n", "def all_int(self, column_int_values_list):\n", " # Iterate over list parsing values\n", "```\n", "\n", "Lets try to check validity of values using parse approach." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Validate function\n", "\n", "def validate_column(column_as_list, column_name):\n", " print(\"Analysing {} column for unique value {}\".format(column_name, column_as_list))\n", " for value in df_original['Age'].tolist():\n", " try:\n", " value += 1\n", " except TypeError:\n", " print(\"Error identyfying {} in {} column \".format(value, column_name) )\n", " return False\n", " return True \n", " " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing Age column for unique value [25, 45, 39, 35, 37, 53, 50, 34, 65, 29, 48, 59, 67, 60, 38, 42, 46, 55, 56, 57, 44, 36, 43, 40, 30, 31, 51, 32, 61, 41, 28, 49, 47, 62, 58, 54, 33, 27, 66, 24, 52, 26, 64, 63, 23]\n", "Is Age column valid True\n" ] } ], "source": [ "print(\"Is Age column valid {}\".format(validate_column(df_original['Age'].unique().tolist(), 'Age')))" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing Experience column for unique value [1, 19, 15, 9, 8, 13, 27, 24, 10, 39, 5, 23, 32, 41, 30, 14, 18, 21, 28, 31, 11, 16, 20, 35, 6, 25, 7, 12, 26, 37, 17, 2, 36, 29, 3, 22, -1, 34, 0, 38, 40, 33, 4, -2, 42, -3, 43]\n", "Is Experience column valid True\n" ] } ], "source": [ "print(\"Is Experience column valid {}\".format(validate_column(df_original['Experience'].unique().tolist(), 'Experience')))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing Income column for unique value [49, 34, 11, 100, 45, 29, 72, 22, 81, 180, 105, 114, 40, 112, 130, 193, 21, 25, 63, 62, 43, 152, 83, 158, 48, 119, 35, 41, 18, 50, 121, 71, 141, 80, 84, 60, 132, 104, 52, 194, 8, 131, 190, 44, 139, 93, 188, 39, 125, 32, 20, 115, 69, 85, 135, 12, 133, 19, 82, 109, 42, 78, 51, 113, 118, 64, 161, 94, 15, 74, 30, 38, 9, 92, 61, 73, 70, 149, 98, 128, 31, 58, 54, 124, 163, 24, 79, 134, 23, 13, 138, 171, 168, 65, 10, 148, 159, 169, 144, 165, 59, 68, 91, 172, 55, 155, 53, 89, 28, 75, 170, 120, 99, 111, 33, 129, 122, 150, 195, 110, 101, 191, 140, 153, 173, 174, 90, 179, 145, 200, 183, 182, 88, 160, 205, 164, 14, 175, 103, 108, 185, 204, 154, 102, 192, 202, 162, 142, 95, 184, 181, 143, 123, 178, 198, 201, 203, 189, 151, 199, 224, 218]\n", "Is Income column valid True\n" ] } ], "source": [ "print(\"Is Income column valid {}\".format(validate_column(df_original['Income'].unique().tolist(), 'Income')))" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing ZIP Code column for unique value [91107, 90089, 94720, 94112, 91330, 92121, 91711, 93943, 93023, 94710, 90277, 93106, 94920, 91741, 95054, 95010, 94305, 91604, 94015, 90095, 91320, 95521, 95064, 90064, 94539, 94104, 94117, 94801, 94035, 92647, 95814, 94114, 94115, 92672, 94122, 90019, 95616, 94065, 95014, 91380, 95747, 92373, 92093, 94005, 90245, 95819, 94022, 90404, 93407, 94523, 90024, 91360, 95670, 95123, 90045, 91335, 93907, 92007, 94606, 94611, 94901, 92220, 93305, 95134, 94612, 92507, 91730, 94501, 94303, 94105, 94550, 92612, 95617, 92374, 94080, 94608, 93555, 93311, 94704, 92717, 92037, 95136, 94542, 94143, 91775, 92703, 92354, 92024, 92831, 92833, 94304, 90057, 92130, 91301, 92096, 92646, 92182, 92131, 93720, 90840, 95035, 93010, 94928, 95831, 91770, 90007, 94102, 91423, 93955, 94107, 92834, 93117, 94551, 94596, 94025, 94545, 95053, 90036, 91125, 95120, 94706, 95827, 90503, 90250, 95817, 95503, 93111, 94132, 95818, 91942, 90401, 93524, 95133, 92173, 94043, 92521, 92122, 93118, 92697, 94577, 91345, 94123, 92152, 91355, 94609, 94306, 96150, 94110, 94707, 91326, 90291, 92807, 95051, 94085, 92677, 92614, 92626, 94583, 92103, 92691, 92407, 90504, 94002, 95039, 94063, 94923, 95023, 90058, 92126, 94118, 90029, 92806, 94806, 92110, 94536, 90623, 92069, 92843, 92120, 95605, 90740, 91207, 95929, 93437, 90630, 90034, 90266, 95630, 93657, 92038, 91304, 92606, 92192, 90745, 95060, 94301, 92692, 92101, 94610, 90254, 94590, 92028, 92054, 92029, 93105, 91941, 92346, 94402, 94618, 94904, 9307, 95482, 91709, 91311, 94509, 92866, 91745, 94111, 94309, 90073, 92333, 90505, 94998, 94086, 94709, 95825, 90509, 93108, 94588, 91706, 92109, 92068, 95841, 92123, 91342, 90232, 92634, 91006, 91768, 90028, 92008, 95112, 92154, 92115, 92177, 90640, 94607, 92780, 90009, 92518, 91007, 93014, 94024, 90027, 95207, 90717, 94534, 94010, 91614, 94234, 90210, 95020, 92870, 92124, 90049, 94521, 95678, 95045, 92653, 92821, 90025, 92835, 91910, 94701, 91129, 90071, 96651, 94960, 91902, 90033, 95621, 90037, 90005, 93940, 91109, 93009, 93561, 95126, 94109, 93107, 94591, 92251, 92648, 92709, 91754, 92009, 96064, 91103, 91030, 90066, 95403, 91016, 95348, 91950, 95822, 94538, 92056, 93063, 91040, 92661, 94061, 95758, 96091, 94066, 94939, 95138, 95762, 92064, 94708, 92106, 92116, 91302, 90048, 90405, 92325, 91116, 92868, 90638, 90747, 93611, 95833, 91605, 92675, 90650, 95820, 90018, 93711, 95973, 92886, 95812, 91203, 91105, 95008, 90016, 90035, 92129, 90720, 94949, 90041, 95003, 95192, 91101, 94126, 90230, 93101, 91365, 91367, 91763, 92660, 92104, 91361, 90011, 90032, 95354, 94546, 92673, 95741, 95351, 92399, 90274, 94087, 90044, 94131, 94124, 95032, 90212, 93109, 94019, 95828, 90086, 94555, 93033, 93022, 91343, 91911, 94803, 94553, 95211, 90304, 92084, 90601, 92704, 92350, 94705, 93401, 90502, 94571, 95070, 92735, 95037, 95135, 94028, 96003, 91024, 90065, 95405, 95370, 93727, 92867, 95821, 94566, 95125, 94526, 94604, 96008, 93065, 96001, 95006, 90639, 92630, 95307, 91801, 94302, 91710, 93950, 90059, 94108, 94558, 93933, 92161, 94507, 94575, 95449, 93403, 93460, 95005, 93302, 94040, 91401, 95816, 92624, 95131, 94965, 91784, 91765, 90280, 95422, 95518, 95193, 92694, 90275, 90272, 91791, 92705, 91773, 93003, 90755, 96145, 94703, 96094, 95842, 94116, 90068, 94970, 90813, 94404, 94598]\n", "Is ZIP Code colum valid True\n" ] } ], "source": [ "print(\"Is ZIP Code colum valid {}\".format(validate_column(df_original['ZIP Code'].unique().tolist(), 'ZIP Code')))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing Family column for unique value [4, 3, 1, 2]\n", "Is Family colum valid True\n" ] } ], "source": [ "print(\"Is Family colum valid {}\".format(validate_column(df_original['Family'].unique().tolist(), 'Family')))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing CCAvg column for unique value [1.6, 1.5, 1.0, 2.7, 0.4, 0.3, 0.6, 8.9, 2.4, 0.1, 3.8, 2.5, 2.0, 4.7, 8.1, 0.5, 0.9, 1.2, 0.7, 3.9, 0.2, 2.2, 3.3, 1.8, 2.9, 1.4, 5.0, 2.3, 1.1, 5.7, 4.5, 2.1, 8.0, 1.7, 0.0, 2.8, 3.5, 4.0, 2.6, 1.3, 5.6, 5.2, 3.0, 4.6, 3.6, 7.2, 1.75, 7.4, 2.67, 7.5, 6.5, 7.8, 7.9, 4.1, 1.9, 4.3, 6.8, 5.1, 3.1, 0.8, 3.7, 6.2, 0.75, 2.33, 4.9, 0.67, 3.2, 5.5, 6.9, 4.33, 7.3, 4.2, 4.4, 6.1, 6.33, 6.6, 5.3, 3.4, 7.0, 6.3, 8.3, 6.0, 1.67, 8.6, 7.6, 6.4, 10.0, 5.9, 5.4, 8.8, 1.33, 9.0, 6.7, 4.25, 6.67, 5.8, 4.8, 3.25, 5.67, 8.5, 4.75, 4.67, 3.67, 8.2, 3.33, 5.33, 9.3, 2.75]\n", "Is CCAvg colum valid True\n" ] } ], "source": [ "print(\"Is CCAvg colum valid {}\".format(validate_column(df_original['CCAvg'].unique().tolist(), 'CCAvg')))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing Education column for unique value [1, 2, 3]\n", "Is Education colum valid True\n" ] } ], "source": [ "print(\"Is Education colum valid {}\".format(validate_column(df_original['Education'].unique().tolist(), 'Education')))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing Mortgage column for unique value [0, 155, 104, 134, 111, 260, 163, 159, 97, 122, 193, 198, 285, 412, 153, 211, 207, 240, 455, 112, 336, 132, 118, 174, 126, 236, 166, 136, 309, 103, 366, 101, 251, 276, 161, 149, 188, 116, 135, 244, 164, 81, 315, 140, 95, 89, 90, 105, 100, 282, 209, 249, 91, 98, 145, 150, 169, 280, 99, 78, 264, 113, 117, 325, 121, 138, 77, 158, 109, 131, 391, 88, 129, 196, 617, 123, 167, 190, 248, 82, 402, 360, 392, 185, 419, 270, 148, 466, 175, 147, 220, 133, 182, 290, 125, 124, 224, 141, 119, 139, 115, 458, 172, 156, 547, 470, 304, 221, 108, 179, 271, 378, 176, 76, 314, 87, 203, 180, 230, 137, 152, 485, 300, 272, 144, 94, 208, 275, 83, 218, 327, 322, 205, 227, 239, 85, 160, 364, 449, 75, 107, 92, 187, 355, 106, 587, 214, 307, 263, 310, 127, 252, 170, 265, 177, 305, 372, 79, 301, 232, 289, 212, 250, 84, 130, 303, 256, 259, 204, 524, 157, 231, 287, 247, 333, 229, 357, 361, 294, 86, 329, 142, 184, 442, 233, 215, 394, 475, 197, 228, 297, 128, 241, 437, 178, 428, 162, 234, 257, 219, 337, 382, 397, 181, 120, 380, 200, 433, 222, 483, 154, 171, 146, 110, 201, 277, 268, 237, 102, 93, 354, 195, 194, 238, 226, 318, 342, 266, 114, 245, 341, 421, 359, 565, 319, 151, 267, 601, 567, 352, 284, 199, 80, 334, 389, 186, 246, 589, 242, 143, 323, 535, 293, 398, 343, 255, 311, 446, 223, 262, 422, 192, 217, 168, 299, 505, 400, 165, 183, 326, 298, 569, 374, 216, 191, 408, 406, 452, 432, 312, 477, 396, 582, 358, 213, 467, 331, 295, 235, 635, 385, 328, 522, 496, 415, 461, 344, 206, 368, 321, 296, 373, 292, 383, 427, 189, 202, 96, 429, 431, 286, 508, 210, 416, 553, 403, 225, 500, 313, 410, 273, 381, 330, 345, 253, 258, 351, 353, 308, 278, 464, 509, 243, 173, 481, 281, 306, 577, 302, 405, 571, 581, 550, 283, 612, 590, 541]\n", "Is Mortgage colum valid True\n" ] } ], "source": [ "print(\"Is Mortgage colum valid {}\".format(validate_column(df_original['Mortgage'].unique().tolist(), 'Mortgage')))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing Securities Account column for unique value [1, 0]\n", "Is Securities Account colum valid True\n" ] } ], "source": [ "print(\"Is Securities Account colum valid {}\".format(validate_column(df_original['Securities Account'].unique().tolist(), 'Securities Account')))" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing CD Account column for unique value [0, 1]\n", "Is CD Account colum valid True\n" ] } ], "source": [ "print(\"Is CD Account colum valid {}\".format(validate_column(df_original['CD Account'].unique().tolist(), 'CD Account')))" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing Online column for unique value [0, 1]\n", "Is Online colum valid True\n" ] } ], "source": [ "print(\"Is Online colum valid {}\".format(validate_column(df_original['Online'].unique().tolist(), 'Online')))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing CreditCard column for unique value [0, 1]\n", "Is CreditCard colum valid True\n" ] } ], "source": [ "print(\"Is CreditCard colum valid {}\".format(validate_column(df_original['CreditCard'].unique().tolist(), 'CreditCard')))" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing Personal Loan column for unique value [0, 1]\n", "Is Personal Loan colum valid True\n" ] } ], "source": [ "\n", "print(\"Is Personal Loan colum valid {}\".format(validate_column(df_original['Personal Loan'].unique().tolist(), 'Personal Loan')))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets Visually inspect distribution of values across each column and check presence of outlier" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Age\n", "sns.distplot(df_original['Age'],kde=True)\n", "# Here we conclude that data set is captured for a wide range of age group" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Experience\n", "sns.distplot(df_original['Experience'],kde=True)\n", "\n", "#Again here wide range of experience levels" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#Lets Analyse which relation between age and personal load\n", "\n", "sns.catplot(y='Age', x='Personal Loan', data=df_original)\n", "# Except that 0 is more denser than 1 there is no enough visual reference that who would take more loan from this relationship" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['ID', 'Age', 'Experience', 'Income', 'ZIP Code', 'Family', 'CCAvg', 'Education', 'Mortgage', 'Personal Loan', 'Securities Account', 'CD Account', 'Online', 'CreditCard']\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Lets try income vs loan\n", "sns.catplot(y='Income', x='Personal Loan', data=df_original)\n", "# Quite evident, people whose Income is between 100 and 200 tend to take loan more compared to ones present in lower income.\n", "# SO field relation to income has proven to have good influence on personal loan, lets see what else fields we have relation to\n", "print(df_columns)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.catplot(y='CCAvg', x='Personal Loan', data=df_original)\n", "# People whose credit card average is between 2 - 6 seems to have more personal loans \n", "# that other range as seen from graph below" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# One more column we have in our dataset which doesnt corresponds to any income values but is the one which affect it\n", "# Family can we say that people where more family members are present have takne personal loan\n", "# sns.relplot(x='Personal Loan', y='Family', data=df_original, fit_reg=False)\n", "sns.relplot(x=\"Family\", y=\"Income\",hue=\"Personal Loan\", data=df_original);\n", "\n", "# We can conclude from this graph that, people who have 3 or 4 family members and whose income is above ~100 to ~200 tend to opt\n", "# for personal loan more than a different ranges as observed from graph.\n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Family1234
Personal Loan
0136511908771088
1107106133134
\n", "
" ], "text/plain": [ "Family 1 2 3 4\n", "Personal Loan \n", "0 1365 1190 877 1088\n", "1 107 106 133 134" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Little More exploration using categorical columns\n", "pd.crosstab(df_original['Personal Loan'],df_original['Family'])" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAASVklEQVR4nO3df6xfd33f8ecrscEOZImKPa6V+OaCQGOUlQSu0iCmKc0qLVAgSpuqYUDBAlmrYJS1WzWqiRtH20SrjY7ULdRLMIQiGkoINTTVGmAZRIyAE5yQ2Klq0ZvFJZ5tQkI9+yax/d4f92t6c/O9vtd2zj3X9/N8SFf3fD/nc87n7eR7v6/v+Z2qQpLUrrP6LkCS1C+DQJIaZxBIUuMMAklqnEEgSY1b0XcBJ2vNmjU1NjbWdxmSdEa55557DlTV2mHzzrggGBsbY/v27X2XIUlnlCQPzzXPXUOS1DiDQJIaZxBIUuMMAklqnEEgSY3rPAiSnJ3ku0m+PGTe85PckmR3kruTjHVdjyTpmRZji+DXgV1zzHs38KOqehnwe8DvLEI9kqQZOr2OIMmFwC8A/wn4jSFdrgKuG0x/HticJLXM7429detWJicn+y6DvXv3AjAyMtJrHWNjY2zYsKHXGpaCpfS+mJqa6ruMJWPVqlXL/m+k6wvK/hvwW8C5c8y/AHgEoKqOJHkCeBFwYGanJBuBjQCjo6OdFdsa/9iXlsnJSb7/VztZd06/h+6OTB3j2LFeS1hSjjx9iMNPP97b+I8e6v5/RmdBkORNwL6quifJ5XN1G9L2rK2BqtoCbAEYHx8/47cWlsq334mJCQA2bdrUcyU6bt05Z/GeV67uuwwtITfuPNz5GF1+9Xg98JYkk8CfAFck+eNZffYA6wGSrADOAx7rsCZJ0iydBUFVfbCqLqyqMeBa4GtV9fZZ3bYB7xxMXzPoc8Z/45ekM8mi33QuyfXA9qraBtwEfDrJbqa3BK5d7HokqXWLEgRVdSdw52D6QzPap4BfXowaJEnDeWWxJDXOIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcZ0FQZJVSb6d5L4kDybZNKTPu5LsT7Jj8POeruqRJA23osN1PwlcUVUHk6wE7kryF1X1rVn9bqmq93VYhyTpBDoLgqoq4ODg5crBT3U1niTp1HR6jCDJ2Ul2APuAO6rq7iHdfinJ/Uk+n2T9HOvZmGR7ku379+/vsmRJak6nQVBVR6vqYuBC4NIkr5rV5UvAWFX9DPAV4FNzrGdLVY1X1fjatWu7LFmSmrMoZw1V1ePAncCVs9p/WFVPDl7+d+C1i1GPJOnvdXnW0Nok5w+mVwM/Dzw0q8+6GS/fAuzqqh5J0nBdnjW0DvhUkrOZDpzPVdWXk1wPbK+qbcD7k7wFOAI8Bryrw3okSUN0edbQ/cAlQ9o/NGP6g8AHu6pBkjQ/ryyWpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDXOIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1rrMgSLIqybeT3JfkwSSbhvR5fpJbkuxOcneSsa7qkSQN1+UWwZPAFVX1auBi4Mokl83q827gR1X1MuD3gN/psB5J0hArulpxVRVwcPBy5eCnZnW7CrhuMP15YHOSDJbtxNatW5mcnOxq9WeU4/8dJiYm+i1kiRgbG2PDhg29jb93714OHTrGjTsP91aDlp5HDx3jnL17Ox2jsyAASHI2cA/wMuAPquruWV0uAB4BqKojSZ4AXgQcmLWejcBGgNHR0dOqaXJykp0P7Sar1pzWepaDeioA7Jp8vOdK+ldTB+bvJC1TnQZBVR0FLk5yPnBbkldV1QMzumTYYkPWswXYAjA+Pn7aWwtZtYYVF119uqvRMnLk4dv6LoGRkREOP/0473nl6r5L0RJy487DrB4Z6XSMRTlrqKoeB+4Erpw1aw+wHiDJCuA84LHFqEmSNK3Ls4bWDrYESLIa+HngoVndtgHvHExfA3yty+MDkqRn63LX0DrgU4PjBGcBn6uqLye5HtheVduAm4BPJ9nN9JbAtR3WI0kaosuzhu4HLhnS/qEZ01PAL3dVgyRpfl5ZLEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1LgFBUGSW5P8QhKDQ5KWmYV+sH8M+JfAXyf5cJJXdFiTJGkRLSgIquorVfU24DXAJHBHkm8m2ZBkZZcFSpK6teBdPUleBLwLeA/wXeCjTAfDHZ1UJklaFAt6QlmSLwCvAD4NvLmqHh3MuiXJ9q6KkyR1b6GPqtxcVV8bNqOqxp/DeiRJi+yEQZDkF4dNH1dVX+iiKEnS4plvi+DNJ5hXgEEgSWe4EwZBVW1YrEIkSf2Yb9fQ26vqj5P8xrD5VfWRbsqSJC2W+XYNvWDw+9yuC5Ek9WO+XUN/NPi96WRXnGQ9cDMwAhwDtlTVR2f1uRz4M+BvBk1fqKrrT3YsSdKpW+h1BC8B/jUwNnOZqnrLCRY7AvxmVd2b5FzgniR3VNXOWf2+UVVvOrmyJUnPlYVeR/BF4CbgS0x/u5/X4KKzRwfTf5dkF3ABMDsIJEk9WmgQTFXVDac6SJIx4BLg7iGzX5fkPuAHwL+tqgeHLL8R2AgwOjp6qmVIkoZYaBB8NMkE8JfAk8cbq+re+RZM8kLgVuADVfXjWbPvBS6qqoNJ3sj0lsfLZ6+jqrYAWwDGx8drgTVLkhZgoUHwT4B3AFfw97uGavB6ToM7k94KfGbYVcgzg6Gqbk/yh0nWVNWBBdYlSTpNCw2Cq4GXVtVTC11xkjB9XGHXXNcbJBkB/m9VVZJLmb4b6g8XOoYk6fQtNAjuA84H9p3Eul/P9FbE95LsGLT9NjAKUFUfB64Bfi3JEeAwcG1VuetHkhbRQoPgxcBDSb7DM48RzHn6aFXdBeREK62qzcDmBdYgSerAQoNgotMqJEm9WVAQVNX/6roQSVI/FvSoyiSXJflOkoNJnkpyNMnsU0ElSWeghT6zeDPwVuCvgdVMP7fYffuStAws9BgBVbU7ydlVdRTYmuSbHdYlSVokCw2CQ0meB+xI8rtM30PoBfMsI0k6Ayx019A7Bn3fB/w/YD3wS10VJUlaPPM9oWy0qv5PVT08aJoCTvrZBJKkpWu+LYIvHp9IcmvHtUiSejBfEMy8MvilXRYiSerHfEFQc0xLkpaJ+c4aevXgwrEAq2dcRBagquofdFqdJKlz8z28/uzFKkSS1I+Fnj4qSVqmDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjessCJKsT/I/k+xK8mCSXx/SJ0luSLI7yf1JXtNVPZKk4Rb88PpTcAT4zaq6N8m5wD1J7qiqnTP6vAF4+eDnZ4GPDX5LkhZJZ0FQVY8y/ZB7qurvkuwCLgBmBsFVwM1VVcC3kpyfZN1g2U7s3buXmjrIkYdv62oInYFq6gB79071XQaPHjrGjTsP911G7344dQyAF61y7/Wjh451/lSwLrcIfiLJGHAJcPesWRcAj8x4vWfQ9owgSLIR2AgwOjraVZlSr8bGxvouYck4MjkJwOr1Y73WsRS8lO7fG50HQZIXArcCH6iqH8+ePWSRZz0Jraq2AFsAxsfHT+tJaSMjI/xo6nFWXHT16axGy8yRh29jZOT8XmvYsGFDr+MvJRMTEwBs2rSp50ra0Ol2V5KVTIfAZ6rqC0O67AHWz3h9IfCDLmuSJD1Tl2cNBbgJ2FVVH5mj2zbgVwdnD10GPNHl8QFJ0rN1uWvo9cA7gO8l2TFo+21gFKCqPg7cDrwR2A0cAtw2lqRF1uVZQ3cx/BjAzD4FvLerGiRJ8/PcLElqnEEgSY0zCCSpcQaBJDXOIJCkxhkEktQ4g0CSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuM6C4Ikn0iyL8kDc8y/PMkTSXYMfj7UVS2SpLmt6HDdnwQ2AzefoM83qupNHdYgSZpHZ1sEVfV14LGu1i9Jem70fYzgdUnuS/IXSX56rk5JNibZnmT7/v37F7M+SVr2+gyCe4GLqurVwO8DX5yrY1Vtqarxqhpfu3btohUoSS3oLQiq6sdVdXAwfTuwMsmavuqRpFb1FgRJRpJkMH3poJYf9lWPJLWqs7OGknwWuBxYk2QPMAGsBKiqjwPXAL+W5AhwGLi2qqqreiRJw3UWBFX11nnmb2b69FJJUo/6PmtIktQzg0CSGmcQSFLjDAJJapxBIEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4wwCSWqcQSBJjTMIJKlxBoEkNc4gkKTGGQSS1DiDQJIaZxBIUuMMAklqnEEgSY0zCCSpcQaBJDXOIJCkxnUWBEk+kWRfkgfmmJ8kNyTZneT+JK/pqhZJ0ty63CL4JHDlCea/AXj54Gcj8LEOa5EkzWFFVyuuqq8nGTtBl6uAm6uqgG8lOT/Juqp6tKuaflLb1AGOPHxb18PMPf5TT8Cxp3sbf8k5ayV53nm9llBTB4Dze61hqdi6dSuTk5O91nB8/ImJiV7rABgbG2PDhg19l9GpzoJgAS4AHpnxes+g7VlBkGQj01sNjI6OntagY2Njp7X8c2Hv3immpqrvMpaMVauez8hI3x/C5y+J94amrVq1qu8SmtJnEGRI29BPx6raAmwBGB8fP61P0OWe7NLp8m+kPX2eNbQHWD/j9YXAD3qqRZKa1WcQbAN+dXD20GXAE4txfECS9Eyd7RpK8lngcmBNkj3ABLASoKo+DtwOvBHYDRwC3B6VpB50edbQW+eZX8B7uxpfkrQwXlksSY0zCCSpcQaBJDXOIJCkxmX6mO2ZI8l+4OG+61hG1gAH+i5CGsL35nProqpaO2zGGRcEem4l2V5V433XIc3me3PxuGtIkhpnEEhS4wwCbem7AGkOvjcXiccIJKlxbhFIUuMMAklqnEHQqCRXJvmrJLuT/Pu+65GOS/KJJPuSPNB3La0wCBqU5GzgD4A3AK8E3prklf1WJf3EJ4Er+y6iJQZBmy4FdlfV96vqKeBPgKt6rkkCoKq+DjzWdx0tMQjadAHwyIzXewZtkhpkELQpQ9o8j1hqlEHQpj3A+hmvLwR+0FMtknpmELTpO8DLk7wkyfOAa4FtPdckqScGQYOq6gjwPuB/ALuAz1XVg/1WJU1L8lngfwP/KMmeJO/uu6blzltMSFLj3CKQpMYZBJLUOINAkhpnEEhS4wwCSWqcQaBlIcnRJDuSPJDkT5Oc03dNMyU5eDLt0mIyCLRcHK6qi6vqVcBTwL9a6IKDu7FKzTIItBx9A3gZQJK3J/n2YGvhj45/6Cc5mOT6JHcDr0vy4SQ7k9yf5L8M+lyU5KuDtq8mGR20fzLJDUm+meT7Sa4ZtL9w0O/eJN9Lckp3dD3BuG9OcneS7yb5SpIXD9qvG9zD/85BPe8/3f+AaotBoGUlyQqmn7PwvST/GPgV4PVVdTFwFHjboOsLgAeq6meBncDVwE9X1c8A/3HQZzNw86DtM8ANM4ZaB/xT4E3AhwdtU8DVVfUa4OeA/5pk2A3+5jPXuHcBl1XVJUzfOvy3ZizzCuBfMH2L8YkkK09hXDVqRd8FSM+R1Ul2DKa/AdwEbAReC3xn8Hm8Gtg36HMUuHUw/WOmP8RvTPLnwJcH7a8DfnEw/Wngd2eM98WqOgbsPP7NnOm7uv7nJP8MOMb0rb1fDOw9yX/LXONeCNySZB3wPOBvZizz51X1JPBkkn2Dcfec5LhqlEGg5eLw4Fv/Twy+jX+qqj44pP9UVR2F6XsvJbkU+OdM34DvfcAVQ5aZeT+WJ2cONfj9NmAt8NqqejrJJLDqVP4xc4z7+8BHqmpbksuB6+ao5yj+beskuGtIy9lXgWuS/EOAJD+V5KLZnZK8EDivqm4HPgAcD5RvMh0MMP0hf9c8450H7BuEwM8BzxprgeYa9zzgbwfT7zzFdUvP4rcGLVtVtTPJfwD+MslZwNPAe4GHZ3U9F/izJKuY/nb/bwbt7wc+keTfAfuBDfMM+RngS0m2AzuAhxZQ5jlJZu7C+cgJxr0O+NMkfwt8C3jJAtYvzcu7j0pS49w1JEmNMwgkqXEGgSQ1ziCQpMYZBJLUOINAkhpnEEhS4/4//7Ow2m1pUjoAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#Box Plot family and personal loan coparison\n", "sns.boxplot(y=\"Family\", orient=\"v\", x=\"Personal Loan\", data=df_original)\n", "# People are from 3-4 most probably these are people, having 1 more children seems to have opeted for personal loan" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAASTElEQVR4nO3dfZBdd13H8fcnhDYtIA0m7aYP9qIWsTpYYAUdhGmsgzyotZIqiLbsoFUHBlFHLeqwjaO2o4AjPjBGaCgMgqW1UrWjQqbaMgo0xU4paauZZkvTZputCQg2afPw9Y89e9g2m+S25O653ft+zWTuPb89d88nmc1+7nm4v5OqQpIkgGVdB5AkDQ9LQZLUshQkSS1LQZLUshQkSa3lXQf4Rqxatap6vV7XMSTpKeXWW299qKpWL/S1p3Qp9Ho9Nm/e3HUMSXpKSXLv4b7m4SNJUstSkCS1LAVJUstSkCS1LAUBsHv3bt75zneye/furqNI6pClIACuueYa7rrrLq699tquo0jqkKUgdu/ezY033khVceONN7q3II0wS0Fcc801zE2hfvDgQfcWpBFmKYibb76Z/fv3A7B//35uuummjhNJ6oqlIF7+8pezfPnsh9uXL1/OK17xio4TSeqKpSDWrVvXPk/C6173ug7TSOqSpSBWrlzJ2NgYAGNjY6xcubLjRJK6YimI3bt38+CDDwIwPT3t1UfSCLMU9Jirj6rKq4+kEWYpyKuPJLUsBXn1kaSWpSDWrVtHEgCWLVvm1UfSCLMUxMqVK1m7di1JWLt2rVcfSSPsKX07Th0769at47777nMvQRpxmbvq5KlofHy8vEezJD0xSW6tqvGFvubhI0lSy1KQJLUsBQGwbds2LrroIqamprqOIqlDloIAuPzyy9mzZw+XX35511EkdchSENu2bWvnO9q1a5d7C9IIsxR0yN6BewvS6PJzCjpkVtRdu3Z1lETDZuPGjZ3vOU5PTwO007t3qdfrMTEx0XWMgRrYnkKSM5LcmOTOJF9M8svN+HOSfDLJfzePK5vxJHlvkq1Jbk/yokFlk/TUsXfvXvbu3dt1jJExyD2F/cCvVdXnkzwLuDXJJ4E3AZuq6ooklwKXAr8JvBo4q/nzUuB9zaOkjgzDu+LJyUkA1q9f33GS0TCwPYWq2lFVn2+efxW4EzgNOB+4qlntKuDHm+fnAx+qWZ8BTkqyZlD5JEmHWpQTzUl6wAuBzwKnVNUOmC0O4ORmtdOA++a9bHsz9vjvdUmSzUk2z8zMDDK2JI2cgZdCkmcC1wJvr6r/PdKqC4wdMjFTVW2oqvGqGl+9evWxiilJYsClkOTpzBbCR6rqb5vhB+cOCzWPO5vx7cAZ815+OvDAIPNJkh5rkFcfBfgAcGdVvWfel64HLm6eXwx8Yt74Rc1VSN8HfGXuMJMkaXEM8uqjlwE/C3whyW3N2G8BVwBXJ3kz8CXgwuZrNwCvAbYCDwPdX/YgSSNmYKVQVZ9m4fMEAOctsH4BbxlUHknS0TnNhSSpZSlIklqWgiSpZSlIklqWgiSpZSlIklqWgjjuuOMes3z88cd3lERS1ywFMfsRka87ePBgR0kkdc1SEPv27TvisqTRYSlIklqWgiSpZSlIklqWgiSpZSlIklqWgiSpZSlIklqWgiSpZSlIklqWgiSpNbB7NEt68jZu3MjU1FTXMYbC3L/D5ORkt0GGRK/XY2JiYmDf31KQhtDU1BT33L2FNSe6M7983+wEjXvuu6vjJN3b8fDgJ6u0FKQhtebEZfzc2Sd0HUND5P1b9gx8G74NkSS1LAVJUstSEMuWLTvisqTR4f9+HXKnNe+8Jo0uS0GS1LIUJEktS0GS1LIUJEktS0GS1LIUJEktS0GS1LIUJEktJ8Tr2LBOkdzVNMWDnhZY0pG5pyBOOumkxyyvXLmyoySSuuaeQseG5V3xhRde2D7fsGFDh0kkdWlgewpJrkyyM8kd88YuS3J/ktuaP6+Z97V3JNma5O4kPzyoXFrY3N7Ca1/72o6TSOrSIPcUPgj8GfChx43/cVW9a/5AkrOB1wPfBZwKfCrJ86rqwADzaZ5TTz2VU089lTe96U1dR5HUoYHtKVTVTcCuPlc/H/hYVT1SVduArcBLBpVNkrSwLk40vzXJ7c3hpbkzmqcB981bZ3szdogklyTZnGTzzMzMoLNK0khZ7FJ4H/BtwDnADuDdzXgWWLcW+gZVtaGqxqtqfPXq1YNJKUkjalFLoaoerKoDVXUQ+Cu+fohoO3DGvFVPBx5YzGySpEUuhSRr5i1eAMxdmXQ98Pokxyd5LnAW8LnFzCZJGuDVR0k+CpwLrEqyHZgEzk1yDrOHhqaAXwCoqi8muRrYAuwH3uKVR5K0+AZWClX1hgWGP3CE9X8f+P1B5ZEkHZ2faJaG0PT0NA8/fJD3b9nTdRQNkR0PH+TE6emBbsO5jyRJLfcUpCE0NjbGnn1f5ufOPqHrKBoi79+yhxPGxga6DfcUJEktS0GS1LIUJEktS0GS1LIUJEktS0GS1LIUJEmtvkohyfOSbJq7tWaSFyT5ncFGkyQttn73FP4KeAewD6Cqbmf29pmSpCWk31I4saoeP5X1/mMdRpLUrX5L4aEk30ZzN7Qk65i9c5okaQnpd+6jtwAbgOcnuR/YBvzMwFJJkjrRVylU1T3ADyV5BrCsqr462FiSpC70VQpJTgIuAnrA8iQAVNXbBpZMkrTo+j18dAPwGeALwMHBxZEkdanfUlhRVb860CSSpM71WwofTvLzwD8Aj8wNVtWugaSSxA5vxwnA/+ydPTjxzSucgGHHwwf51gFvo99SeBT4I+C3aS5LbR4HnU8aSb1er+sIQ2P/1BQAJ5zR6zTHMPhWBv+z0W8p/Crw7VX10CDDSJo1MTHRdYShMTk5CcD69es7TjIa+t0f+yLw8CCDSJK61++ewgHgtiQ38thzCl6SKklLSL+l8HfNH0nSEtbvJ5qvSnIc8Lxm6O6q2je4WJKkLvT7ieZzgauAKSDAGUkurqqbBhdNkrTY+j189G7glVV1N8zedAf4KPDiQQWTJC2+fq8+evpcIQBU1X8BTx9MJElSV/rdU9ic5APAh5vlNwK3DiaSJKkr/ZbCLzF7T4W3MXtO4SbgLwYVSpLUjX5LYTnwJ1X1HoAkTwOOH1gqSVIn+j2nsAk4Yd7yCcCnjn0cSVKX+i2FFVX1tbmF5vmJg4kkSepKv6Xwf0leNLeQ5MWAc/pK0hLT7zmFtwMfT/JAs7wG+KnBRJIkdaXfaS5uSfJ84DuYvfroLqe5kKSl54ncyuh7gRcALwTekOSiI62c5MokO5PcMW/sOUk+meS/m8eVzXiSvDfJ1iS3zz9UJUlaPH2VQpIPA+8CfoDZcvheYPwoL/sg8KrHjV0KbKqqs5i9ounSZvzVwFnNn0uA9/WTS5J0bPV7TmEcOLuq6qhrNqrqpiS9xw2fD5zbPL8K+FfgN5vxDzXf/zNJTkqypqp29Ls9SdI3rt/DR3cAY8dge6fM/aJvHk9uxk8D7pu33vZm7BBJLkmyOcnmmZmZYxBJkjSn3z2FVcCWJJ/jsXde+7FjlCMLjC24V1JVG4ANAOPj433vuUiSjq7fUrjsGG3vwbnDQknWADub8e3AGfPWOx144JBXS5IGqt9LUv/tGG3veuBi4Irm8RPzxt+a5GPAS4GveD5BkhbfEUshyVdZ+DBOgKqqbzrCaz/K7EnlVUm2A5PMlsHVSd4MfAm4sFn9BuA1wFbgYWDiif01JEnHwhFLoaqe9WS/cVW94TBfOm+BdYvZqbklSR16Ih9ekyQtcZaCJKllKUiSWpaCJKllKUiSWpaCJKllKUiSWv1Oc7Ekbdy4kampqa5jDIW5f4fJyclugwyJXq/HxISfodToGelSmJqaYstdW8mKVV1H6Vw9Ojsn4Z1TX+44Sfdq70NdR5A6M9KlAJAVq1h+5gVdx9AQ2X/vdV1HkDrjOQVJUstSkCS1LAVJUstSkCS1LAVJUstSkCS1LAVJUstSkCS1LAVJUstSkCS1LAVJUstSkCS1LAVJUstSkCS1LAVJUstSkCS1LAVJUstSkCS1LAVJUmvk79Es6fA2btzI1NRUpxnmtj85OdlpDoBer8fExETXMQZqpEthenqa2vs1b9Sux6i9DzE9vbfrGGqsWLGi6wgjZaRLQdKRLfV3xTrUSJfC2NgYu/d+meVnXtB1FA2R/fdex9jYSV3HkDrhiWZJUstSkCS1LAVJUquTcwpJpoCvAgeA/VU1nuQ5wN8APWAK+Mmq2t1FPkkaVV3uKaytqnOqarxZvhTYVFVnAZuaZUnSIhqmw0fnA1c1z68CfrzDLJI0kroqhQL+JcmtSS5pxk6pqh0AzePJC70wySVJNifZPDMzs0hxJWk0dPU5hZdV1QNJTgY+meSufl9YVRuADQDj4+M1qICSNIo62VOoqgeax53AdcBLgAeTrAFoHnd2kU2SRtmil0KSZyR51txz4JXAHcD1wMXNahcDn1jsbJI06ro4fHQKcF2Sue3/dVX9U5JbgKuTvBn4EnBhB9kkaaQteilU1T3A9yww/j/AeYudR5L0dcN0SaokqWOWgiSpZSlIklqWgiSpZSlIklqWgiSpZSlIklojfY9mgNr7EPvvva7rGJ2rR78CQI57dsdJuld7HwK8R7NG00iXQq/X6zrC0Jia+jIAvZ6/DOEkfzY0ska6FCYmJrqOMDQmJycBWL9+fcdJJHXJcwqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqWQqSpJalIElqLe86wKjbuHEjU1NTXcdoM0xOTnaao9frMTEx0WkGaZRZCgJgxYoVXUeQNAQshY75rljSMPGcgiSpZSlIklpDVwpJXpXk7iRbk1zadR5JGiVDVQpJngb8OfBq4GzgDUnO7jaVJI2OoSoF4CXA1qq6p6oeBT4GnN9xJkkaGcNWCqcB981b3t6MtZJckmRzks0zMzOLGk6SlrphK4UsMFaPWajaUFXjVTW+evXqRYolSaNh2EphO3DGvOXTgQc6yiJJIydVdfS1FkmS5cB/AecB9wO3AD9dVV88zPozwL2Ll3DJWwU81HUIaQH+bB5bZ1bVgodahuoTzVW1P8lbgX8GngZcebhCaNb3+NExlGRzVY13nUN6PH82F89QlQJAVd0A3NB1DkkaRcN2TkGS1CFLQfNt6DqAdBj+bC6SoTrRLEnqlnsKkqSWpSBJalkKcmZaDa0kVybZmeSOrrOMCkthxDkzrYbcB4FXdR1ilFgKcmZaDa2qugnY1XWOUWIp6Kgz00oaHZaCjjozraTRYSnImWkltSwF3QKcleS5SY4DXg9c33EmSR2xFEZcVe0H5mamvRO4+kgz00qLKclHgf8AviPJ9iRv7jrTUuc0F5KklnsKkqSWpSBJalkKkqSWpSBJalkKkqSWpaAlJ8mBJLcluSPJx5Oc2HWm+ZJ87YmMS4vJUtBStKeqzqmq7wYeBX6x3xc2s8ZKI8tS0FJ3M/DtAEl+Jsnnmr2Iv5wrgCRfS/K7ST4LfH+SK5JsSXJ7knc165yZZFMztinJtzTjH0zy3iT/nuSeJOua8Wc2630+yReSPKmZZ4+w3R9N8tkk/5nkU0lOacYva+5B8K9Nnrd9o/+AGi2WgpasJMuZvU/EF5J8J/BTwMuq6hzgAPDGZtVnAHdU1UuBLcAFwHdV1QuA32vW+TPgQ83YR4D3ztvUGuAHgB8BrmjG9gIXVNWLgLXAu5MsNPng0Rxuu58Gvq+qXsjsdOe/Me81zwd+mNlp0SeTPP1JbFcjannXAaQBOCHJbc3zm4EPAJcALwZuaX43nwDsbNY5AFzbPP9fZn+hvz/JPwL/0Ix/P/ATzfMPA384b3t/V1UHgS1z79iZnX32D5K8AjjI7HTkpwDTT/Dvcrjtng78TZI1wHHAtnmv+ceqegR4JMnOZrvbn+B2NaIsBS1Fe5q9gVbzLv2qqnrHAuvvraoDMDsXVJKXAOcxOzngW4EfXOA18+eHeWT+pprHNwKrgRdX1b4kU8CKJ/OXOcx2/xR4T1Vdn+Rc4LLD5DmA/8/1BHj4SKNiE7AuyckASZ6T5MzHr5TkmcCzq+oG4O3AXLn8O7MlAbO/8D99lO09G9jZFMJa4JBt9elw2302cH/z/OIn+b2lQ/gOQiOhqrYk+R3gX5IsA/YBbwHufdyqzwI+kWQFs+/6f6UZfxtwZZJfB2aAiaNs8iPA3yfZDNwG3NVHzBOTzD/M854jbPcy4ONJ7gc+Azy3j+8vHZWzpEqSWh4+kiS1LAVJUstSkCS1LAVJUstSkCS1LAVJUstSkCS1/h8cWL0O03tw8wAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "#Income and Personal Loan\n", "sns.boxplot(y=\"Income\", orient=\"v\", x=\"Personal Loan\", data=df_original)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAQHUlEQVR4nO3df6zddX3H8ecLCqJT+SFXxlpmyew2cXGoDeBcFoQF0Kmggw2ns3Mk3RKcumz+YFkGA1nQqChOzYhUCnEg6ib4I3GsytQwgTIYQhmjwx9UlNYVUaagre/9cT/FQ7n3fk5rz72nnOcjuTnf7/v7+Z7v+5LSV78/zuekqpAkaS57LHQDkqTxZ1hIkroMC0lSl2EhSeoyLCRJXYsWuoFROPDAA2vp0qUL3YYk7VZuvPHG71TV1EzbHpNhsXTpUtauXbvQbUjSbiXJ12fb5mUoSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lS12PyE9y7wh/8zTUL3YLG0D+effRCtyAtCM8sJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHWNPCyS7JnkpiSfauuHJrkuyZ1JPpJk71Z/XFtf37YvHXiPM1r9jiTHj7pnSdIjzceZxeuB2wfW3wacX1XLgPuA01r9NOC+qno6cH4bR5LDgFOBZwInAO9Psuc89C1JakYaFkmWAL8DfLCtBzgG+Fgbsho4qS2f2NZp249t408ELq+qh6rqq8B64IhR9i1JeqRRn1m8G3gT8JO2/hTgu1W1pa1vABa35cXA3QBt+/1t/MP1GfaRJM2DkYVFkhcDG6vqxsHyDEOrs22ufQaPtzLJ2iRrN23atMP9SpJmN8ozi+cDL03yNeBypi8/vRvYL8miNmYJcE9b3gAcAtC27wtsHqzPsM/DqurCqlpeVcunpqZ2/W8jSRNsZGFRVWdU1ZKqWsr0DerPVdUrgc8DJ7dhK4Ar2/JVbZ22/XNVVa1+anta6lBgGXD9qPqWJD3aov6QXe7NwOVJ3grcBFzU6hcBlyZZz/QZxakAVXVbkiuAdcAW4PSq2jr/bUvS5JqXsKiqa4Br2vJdzPA0U1U9CJwyy/7nAueOrkNJ0lz8BLckqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6RhYWSfZJcn2S/0xyW5K/bfVDk1yX5M4kH0myd6s/rq2vb9uXDrzXGa1+R5LjR9WzJGlmozyzeAg4pqp+HTgcOCHJUcDbgPOrahlwH3BaG38acF9VPR04v40jyWHAqcAzgROA9yfZc4R9S5K2M7KwqGkPtNW92k8BxwAfa/XVwElt+cS2Ttt+bJK0+uVV9VBVfRVYDxwxqr4lSY820nsWSfZMcjOwEbga+B/gu1W1pQ3ZACxuy4uBuwHa9vuBpwzWZ9hn8Fgrk6xNsnbTpk2j+HUkaWKNNCyqamtVHQ4sYfps4BkzDWuvmWXbbPXtj3VhVS2vquVTU1M727IkaQbz8jRUVX0XuAY4CtgvyaK2aQlwT1veABwC0LbvC2werM+wjyRpHozyaaipJPu15ccDvw3cDnweOLkNWwFc2Zavauu07Z+rqmr1U9vTUocCy4DrR9W3JOnRFvWH7LSDgdXtyaU9gCuq6lNJ1gGXJ3krcBNwURt/EXBpkvVMn1GcClBVtyW5AlgHbAFOr6qtI+xbkrSdkYVFVd0CPHuG+l3M8DRTVT0InDLLe50LnLure5QkDcdPcEuSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXUOFRZI1w9QkSY9Nc846m2Qf4AnAgUn256ffWvdk4BdG3JskaUz0pij/E+ANTAfDjfw0LL4HvG+EfUmSxsicYVFV7wHek+TPquq989STJGnMDPXlR1X13iS/ASwd3KeqLhlRX5KkMTJUWCS5FPgl4GZg21eaFmBYSNIEGPZrVZcDh1VVjbIZSdJ4GvZzFrcCPz/KRiRJ42vYM4sDgXVJrgce2lasqpeOpCtJ0lgZNizOGmUTkqTxNuzTUP826kYkSeNr2Kehvs/0008AewN7Af9XVU8eVWOSpPEx7JnFkwbXk5wEHDGSjiRJY2enZp2tqk8Ax+ziXiRJY2rYy1AvH1jdg+nPXfiZC0maEMM+DfWSgeUtwNeAE3d5N5KksTTsPYvXjLoRSdL4GvbLj5Yk+eckG5Pcm+TjSZaMujlJ0ngY9gb3h4CrmP5ei8XAJ1tNkjQBhg2Lqar6UFVtaT8XA1Mj7EuSNEaGDYvvJHlVkj3bz6uA/x1lY5Kk8TFsWPwx8HvAt4FvAScD3vSWpAkx7KOz5wArquo+gCQHAO9gOkQkSY9xw55ZPGtbUABU1Wbg2aNpSZI0boYNiz2S7L9tpZ1ZDHtWIknazQ0bFu8Erk1yTpKzgWuBt8+1Q5JDknw+ye1Jbkvy+lY/IMnVSe5sr/u3epJckGR9kluSPGfgvVa08XcmWbFzv6okaWcNFRZVdQnwu8C9wCbg5VV1aWe3LcBfVNUzgKOA05McBrwFWFNVy4A1bR3ghcCy9rMS+AA8fBZzJnAk0zPdnjl4liNJGr2hLyVV1Tpg3Q6M/xbTT05RVd9PcjvTH+g7ETi6DVsNXAO8udUvqaoCvpxkvyQHt7FXt/skJLkaOAG4bNheJEk/m52aonxHJVnK9A3x64CDWpBsC5SntmGLgbsHdtvQarPVtz/GyiRrk6zdtGnTrv4VJGmijTwskjwR+Djwhqr63lxDZ6jVHPVHFqourKrlVbV8asoPl0vSrjTSsEiyF9NB8eGq+qdWvrddXqK9bmz1DcAhA7svAe6Zoy5JmicjC4skAS4Cbq+qdw1sugrY9kTTCuDKgfqr21NRRwH3t8tUnwWOS7J/u7F9XKtJkubJKD8r8XzgD4GvJLm51f4KOA+4IslpwDeAU9q2zwAvAtYDP6BNJ1JVm5OcA9zQxp297Wa3JGl+jCwsqupLzHy/AeDYGcYXcPos77UKWLXrupMk7Yh5eRpKkrR7MywkSV2GhSSpy7CQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpK6RhUWSVUk2Jrl1oHZAkquT3Nle92/1JLkgyfoktyR5zsA+K9r4O5OsGFW/kqTZjfLM4mLghO1qbwHWVNUyYE1bB3ghsKz9rAQ+ANPhApwJHAkcAZy5LWAkSfNnZGFRVV8ANm9XPhFY3ZZXAycN1C+paV8G9ktyMHA8cHVVba6q+4CreXQASZJGbL7vWRxUVd8CaK9PbfXFwN0D4za02mz1R0myMsnaJGs3bdq0yxuXpEk2Lje4M0Ot5qg/ulh1YVUtr6rlU1NTu7Q5SZp08x0W97bLS7TXja2+AThkYNwS4J456pKkeTTfYXEVsO2JphXAlQP1V7enoo4C7m+XqT4LHJdk/3Zj+7hWkyTNo0WjeuMklwFHAwcm2cD0U03nAVckOQ34BnBKG/4Z4EXAeuAHwGsAqmpzknOAG9q4s6tq+5vmkqQRG1lYVNUrZtl07AxjCzh9lvdZBazaha1JknbQuNzgliSNMcNCktRlWEiSugwLSVKXYSFJ6jIsJEldhoUkqcuwkCR1GRaSpC7DQpLUZVhIkroMC0lSl2EhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1DWyr1WVNBr//Y4/WugWNIZ++S8vHun7e2YhSeoyLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV2GhSSpy7CQJHUZFpKkLsNCktRlWEiSugwLSVKXYSFJ6jIsJEldu01YJDkhyR1J1id5y0L3I0mTZLcIiyR7Au8DXggcBrwiyWEL25UkTY7dIiyAI4D1VXVXVf0IuBw4cYF7kqSJsWihGxjSYuDugfUNwJGDA5KsBFa21QeS3DFPvU2CA4HvLHQT4+Cycxa6A23HP5vbvHH1rniXp822YXcJi8xQq0esVF0IXDg/7UyWJGuravlC9yFtzz+b82d3uQy1AThkYH0JcM8C9SJJE2d3CYsbgGVJDk2yN3AqcNUC9yRJE2O3uAxVVVuSvBb4LLAnsKqqblvgtiaJl/c0rvyzOU9SVf1RkqSJtrtchpIkLSDDQpLUZVhoTk6zonGUZFWSjUluXeheJoVhoVk5zYrG2MXACQvdxCQxLDQXp1nRWKqqLwCbF7qPSWJYaC4zTbOyeIF6kbSADAvNpTvNiqTJYFhoLk6zIgkwLDQ3p1mRBBgWmkNVbQG2TbNyO3CF06xoHCS5DPh34FeSbEhy2kL39FjndB+SpC7PLCRJXYaFJKnLsJAkdRkWkqQuw0KS1GVYaGIk2Zrk5iS3JvlokicsdE+DkjywI3VpPhkWmiQ/rKrDq+rXgB8Bfzrsjm0GXmliGRaaVF8Eng6Q5FVJrm9nHf+wLRiSPJDk7CTXAc9Lcl6SdUluSfKONuZpSda02pokv9jqFye5IMm1Se5KcnKrP7GN+48kX0myU7P4znHclyS5LslNSf41yUGtflb7DohrWj+v+1n/A2qyGBaaOEkWMf0dHV9J8gzg94HnV9XhwFbglW3ozwG3VtWRwDrgZcAzq+pZwFvbmL8HLmm1DwMXDBzqYOA3gRcD57Xag8DLquo5wAuAdyaZacLGntmO+yXgqKp6NtNTyr9pYJ9fBY5neur5M5PstRPH1YRatNANSPPo8UlubstfBC4CVgLPBW5of2c/HtjYxmwFPt6Wv8f0X/QfTPJp4FOt/jzg5W35UuDtA8f7RFX9BFi37V/4TM/k+3dJfgv4CdNTvh8EfHsHf5fZjrsE+EiSg4G9ga8O7PPpqnoIeCjJxnbcDTt4XE0ow0KT5Ift7OFh7V/1q6vqjBnGP1hVW2F6nqwkRwDHMj2h4muBY2bYZ3D+nIcGD9VeXwlMAc+tqh8n+Rqwz878MrMc973Au6rqqiRHA2fN0s9W/P9fO8DLUJp0a4CTkzwVIMkBSZ62/aAkTwT2rarPAG8AtoXOtUyHB0wHwZc6x9sX2NiC4gXAo441pNmOuy/wzba8YiffW3oU/2WhiVZV65L8NfAvSfYAfgycDnx9u6FPAq5Msg/TZwl/3uqvA1YleSOwCXhN55AfBj6ZZC1wM/BfQ7T5hCSDl4veNcdxzwI+muSbwJeBQ4d4f6nLWWclSV1ehpIkdRkWkqQuw0KS1GVYSJK6DAtJUpdhIUnqMiwkSV3/D5Gr4EU9E4FeAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Lets Analyse using count plot\n", "sns.countplot(x='Personal Loan',data=df_original)\n", "# Even though we have established alot ot relationship between family, age, income, ccavg, income and decided to go further.\n", "# But the content of our data seem insufficient, we have very few cases of People who have opted for personal loan and\n", "# More cases of people who have rejected personal loan offer from bank. This imbalance sometimes can affect in model building\n", "# But in our case this is acceptable, because in real life people who take pl would be less." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.countplot(x='Education', hue='Personal Loan',data=df_original)\n", "# From Graph it is evident 3 > 2 >1 where\n", "# 3: Working, 2: Graduates, 1: Under Graduates" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAWVklEQVR4nO3dfZBV9Z3n8ffXpqWtUeOKzWhoIrA6Y/AJnRYdoylK3MSgYmbUDQ6jErTIpDSaRSvjOCkDVrYySaZiYrDiMqWJRkesoE7U8WFNhBgzCjaIRMBEdoyxldIWMyhRgrTf/aMvpGlu0xfs07fb835V3erz8DvnfLm39HN/5+F3IzORJJXXHvUuQJJUXwaBJJWcQSBJJWcQSFLJGQSSVHLD6l3ArjrggANyzJgx9S5DkoaUZcuWvZ6ZzdXWDbkgGDNmDG1tbfUuQ5KGlIh4sbd1nhqSpJIzCCSp5AwCSSq5IXeNQFJ5vPvuu7S3t7Np06Z6lzJkNDU10dLSQmNjY83bGASSBq329nb22WcfxowZQ0TUu5xBLzNZv3497e3tjB07tubtPDUkadDatGkTI0aMMARqFBGMGDFil3tQBoGkQc0Q2DW7834ZBJJUcgaBpCGloaGBCRMmcMQRR3Duuefy9ttv17uk7ey99967tHww8GLxIPPrf55R7xJ22Z9d+YN6l6AS2WuvvVixYgUA06dP58Ybb2T27Nk1bdvZ2UlDQ0OR5Q1J9ggkDVknn3wya9euBeC2225j4sSJTJgwgc997nN0dnYCXd/Er7nmGo4//nieeOIJrrrqKsaPH89RRx3FlVdeCcCLL77I5MmTOeqoo5g8eTK//e1vAZgxYwaXXXYZJ554IuPGjWPhwoUAbNy4kcmTJ3Psscdy5JFH8uMf/3i36u/tuPfddx/HH388xxxzDKeeeiqvvvoqAHPmzGHmzJlMmjSJcePGcf311+/+m9eNQSBpSNqyZQsPPvggRx55JGvWrOHOO+/kF7/4BStWrKChoYHbb78dgN///vccccQRLFmyhPHjx3PPPfewatUqVq5cyZe//GUALr30Ui644AJWrlzJ9OnTueyyy7YdZ926dTz++OPcf//9XHXVVUDXvfr33HMPy5cvZ9GiRVxxxRXszs/+9nbck046iSeffJKnn36aadOm8Y1vfGPbNs899xwPP/wwS5cuZe7cubz77ru7/R5u5akhSUPKO++8w4QJE4CuHsFFF13E/PnzWbZsGccdd9y2NiNHjgS6rimcffbZAOy77740NTVx8cUXc/rpp3PGGWcA8MQTT3D33XcDcP755/OlL31p2/E+/elPs8ceezB+/Pht38wzk6uvvprHHnuMPfbYg5dffplXX32VAw88cJf+Lb0dt729nc985jOsW7eOzZs3b/dMwOmnn87w4cMZPnw4I0eO5NVXX6WlpWXX3sQeDAJJQ0r3awRbZSYXXnghX/va13Zo39TUtO26wLBhw1i6dCk//elPWbBgAfPmzePRRx/dYZvut2AOHz58u+MA3H777XR0dLBs2TIaGxsZM2ZMvzz9vPW4X/jCF5g9ezZTp05l8eLFzJkzp2o9DQ0NbNmy5X0f11NDkoa8yZMns3DhQl577TUA3njjDV58ccdRlzdu3MiGDRuYMmUK3/72t7cFyoknnsiCBQuArv/Jn3TSSTs93oYNGxg5ciSNjY0sWrSo6rFq0dtxN2zYwKhRowC45ZZbdmvfu8IegaQhb/z48Xz1q1/lE5/4BO+99x6NjY3ccMMNHHzwwdu1e+uttzjrrLPYtGkTmcl1110HwPXXX8/MmTP55je/SXNzM9///vd3erzp06dz5pln0trayoQJEzjssMP6rPHtt9/e7hTO7Nmzez3unDlzOPfccxk1ahQnnHACL7zwwq6+JbskducCRz21trbmB/mHabx9VPqjNWvW8NGPfrTeZQw51d63iFiWma3V2ntqSJJKrrAgiIimiFgaEc9ExKqImFulzfCIuDMi1kbEkogYU1Q9kqTqiuwR/AE4JTOPBiYAp0XECT3aXAT8LjMPAa4Dvl5gPZKkKgoLguyysTLbWHn1vCBxFrD1kvhCYHI41KAkDahCrxFERENErABeAx7JzCU9mowCXgLIzC3ABmBElf3Mioi2iGjr6OgosmRJKp1CgyAzOzNzAtACTIyII3o0qfbtf4fbmDJzfma2ZmZrc3NzEaVKUmkNyHMEmflfEbEYOA14ttuqdmA00B4Rw4APAW8MRE2Syutvrlncr/v712sn9dnmoYce4vLLL6ezs5OLL75427hFg0GRdw01R8R+lem9gFOB53o0uxe4sDJ9DvBoDrUHGySpD52dnVxyySU8+OCDrF69mjvuuIPVq1fXu6xtijw1dBCwKCJWAk/RdY3g/oi4NiKmVtrcBIyIiLXAbGDwRKQk9ZOlS5dyyCGHMG7cOPbcc0+mTZu220NXF6GwU0OZuRI4psrya7pNbwLOLaoGSRoMXn75ZUaPHr1tvqWlhSVLet47Uz8+WSxJBat2xnsw3SlvEEhSwVpaWnjppZe2zbe3t/PhD3+4jhVtzyCQpIIdd9xxPP/887zwwgts3ryZBQsWMHXq1L43HCAOQy2pdGq53bM/DRs2jHnz5vHJT36Szs5OZs6cyeGHHz6gNeyMQSBJA2DKlClMmTKl3mVU5akhSSo5g0CSSs4gkKSSMwgkqeQMAkkqOYNAkkrO20cllc6v/3lGv+7vz678QZ9tZs6cyf3338/IkSN59tln+2w/kOwRSNIAmDFjBg899FC9y6jKIJCkAfDxj3+c/fffv95lVGUQSFLJGQSSVHJeLNag1d+/KzsQBnowM6k/2COQpJKzRyCpdGq53bO/nXfeeSxevJjXX3+dlpYW5s6dy0UXXTTgdVRjEEjSALjjjjvqXUKvPDUkSSVnEEhSyRkEkga1zKx3CUPK7rxfXiOQ+lF/j2FTtHpcNN0VTU1NrF+/nhEjRhAR9S5n0MtM1q9fT1NT0y5tV1gQRMRo4FbgQOA9YH5mfqdHm0nAj4EXKovuzsxri6pJ0tDS0tJCe3s7HR0d9S5lyGhqaqKlpWWXtimyR7AFuCIzl0fEPsCyiHgkM1f3aPfzzDyjwDokDVGNjY2MHTu23mV84BV2jSAz12Xm8sr0W8AaYFRRx5Mk7Z4BuVgcEWOAY4AlVVb/ZUQ8ExEPRsThvWw/KyLaIqLNLqIk9a/CgyAi9gbuAr6YmW/2WL0cODgzjwa+C/xbtX1k5vzMbM3M1ubm5mILlqSSKTQIIqKRrhC4PTPv7rk+M9/MzI2V6QeAxog4oMiaJEnbKywIouter5uANZn5rV7aHFhpR0RMrNSzvqiaJEk7KvKuoY8B5wO/jIgVlWVXAx8ByMwbgXOAz0fEFuAdYFr69IgkDajCgiAzHwd2+gRIZs4D5hVVgySpbz5ZLEkMvafCof+eDHesIUkqOXsEkgox1H5qdM6+9a6gfuwRSFLJGQSSVHIGgSSVnEEgSSVnEEhSyRkEklRyBoEklZxBIEklZxBIUskZBJJUcgaBJJXcB3qsoaE21gmUe7wTSfVhj0CSSs4gkKSSMwgkqeQMAkkqOYNAkkrOIJCkkjMIJKnkDAJJKrnCgiAiRkfEoohYExGrIuLyKm0iIq6PiLURsTIiji2qHklSdUU+WbwFuCIzl0fEPsCyiHgkM1d3a/Mp4NDK63jge5W/kqQBUliPIDPXZebyyvRbwBpgVI9mZwG3Zpcngf0i4qCiapIk7WhArhFExBjgGGBJj1WjgJe6zbezY1gQEbMioi0i2jo6OooqU5JKqfAgiIi9gbuAL2bmmz1XV9kkd1iQOT8zWzOztbm5uYgyJam0Cg2CiGikKwRuz8y7qzRpB0Z3m28BXimyJknS9oq8ayiAm4A1mfmtXprdC1xQuXvoBGBDZq4rqiZJ0o6KvGvoY8D5wC8jYkVl2dXARwAy80bgAWAKsBZ4G/hsgfVIkqooLAgy83GqXwPo3iaBS4qqQZLUN58slqSSqykIIuKuiDg9IgwOSfqAqfV/7N8D/gZ4PiL+KSIOK7AmSdIAqikIMvMnmTkdOBb4DfBIRPxHRHy2couoJGmIqvlUT0SMAGYAFwNPA9+hKxgeKaQySdKAqOmuoYi4GzgM+CFwZrd7/e+MiLaiipMkFa/W20fnZeaj1VZkZms/1iNJGmA7DYKI+Otq01v1MmyEJGkI6atHcOZO1iVgEEjSELfTIMhMh3yQpA+4vk4N/W1m3hYRs6ut38lgcpKkIaKvU0N/Uvm7T9GFSJLqo69TQ/+n8nfuwJQjSRpotT5HMBb4AjCm+zaZObWYsiRJA6XW5wj+ja4fmbkPeK+4ciRJA63WINiUmdcXWokkqS5qDYLvRMRXgP8L/GHrwsxcXkhVkqQBU2sQHEnXz06ewh9PDWVlXpI0hNUaBH8FjMvMzUUWI0kaeLUOQ/0MsF+RhUiS6qPWHsGfAs9FxFNsf43A20claYirNQi+UmgVkqS6qSkIMvNnRRciSaqPmq4RRMQJEfFURGyMiM0R0RkRb/axzc0R8VpEPNvL+kkRsSEiVlRe1+zOP0CS9P7U/AtlwDTgR0ArcAFwaB/b/KCy3a07afPzzDyjxhokSQWo+cfrM3Mt0JCZnZn5fWBSH+0fA954f+VJkopWa4/g7YjYE1gREd8A1vHHIarfj7+MiGeAV4ArM3NVP+xTkrQLau0RnF9peynwe2A0cPb7PPZy4ODMPBr4Ll0D21UVEbMioi0i2jo6Ot7nYSVJ3e00CCLiIwCZ+WJmbsrMNzNzbmbOrpwq2m2VfW2sTD8ANEbEAb20nZ+ZrZnZ2tzc/H4OK0nqoa8ewbZv6RFxV38eOCIOjIioTE+s1LK+P48hSepbX9cIotv0uF3ZcUTcQdcF5QMiop2uh9IaATLzRuAc4PMRsQV4B5iWmbkrx5AkvX99BUH2Mt2nzDyvj/Xz6Lq9VJJUR30FwdGVB8cC2KvbQ2QBZGbuW2h1kqTC9fXj9Q0DVYgkqT5qfqBMkvTBZBBIUskZBJJUcgaBJJWcQSBJJWcQSFLJGQSSVHIGgSSVnEEgSSVnEEhSyRkEklRyBoEklZxBIEklZxBIUskZBJJUcgaBJJWcQSBJJWcQSFLJGQSSVHIGgSSVnEEgSSVnEEhSyRUWBBFxc0S8FhHP9rI+IuL6iFgbESsj4tiiapEk9a7IHsEPgNN2sv5TwKGV1yzgewXWIknqRWFBkJmPAW/spMlZwK3Z5Ulgv4g4qKh6JEnV1fMawSjgpW7z7ZVlO4iIWRHRFhFtHR0dA1KcJJVFPYMgqizLag0zc35mtmZma3Nzc8FlSVK51DMI2oHR3eZbgFfqVIsklVY9g+Be4ILK3UMnABsyc10d65GkUhpW1I4j4g5gEnBARLQDXwEaATLzRuABYAqwFngb+GxRtUiSeldYEGTmeX2sT+CSoo4vSaqNTxZLUskZBJJUcgaBJJWcQSBJJWcQSFLJGQSSVHIGgSSVnEEgSSVnEEhSyRkEklRyBoEklZxBIEklZxBIUskZBJJUcgaBJJWcQSBJJWcQSFLJGQSSVHIGgSSVnEEgSSVnEEhSyRkEklRyBoEklVyhQRARp0XEryJibURcVWX9jIjoiIgVldfFRdYjSdrRsKJ2HBENwA3A/wDagaci4t7MXN2j6Z2ZeWlRdUiSdq7IHsFEYG1m/mdmbgYWAGcVeDxJ0m4oMghGAS91m2+vLOvp7IhYGRELI2J0tR1FxKyIaIuIto6OjiJqlaTSKjIIosqy7DF/HzAmM48CfgLcUm1HmTk/M1szs7W5ubmfy5SkcisyCNqB7t/wW4BXujfIzPWZ+YfK7L8Af1FgPZKkKooMgqeAQyNibETsCUwD7u3eICIO6jY7FVhTYD2SpCoKu2soM7dExKXAw0ADcHNmroqIa4G2zLwXuCwipgJbgDeAGUXVI0mqrrAgAMjMB4AHeiy7ptv0PwD/UGQNkqSd88liSSo5g0CSSs4gkKSSMwgkqeQMAkkqOYNAkkrOIJCkkjMIJKnkDAJJKjmDQJJKziCQpJIzCCSp5AwCSSo5g0CSSs4gkKSSMwgkqeQMAkkqOYNAkkrOIJCkkjMIJKnkDAJJKjmDQJJKziCQpJIzCCSp5AoNgog4LSJ+FRFrI+KqKuuHR8SdlfVLImJMkfVIknZUWBBERANwA/ApYDxwXkSM79HsIuB3mXkIcB3w9aLqkSRVV2SPYCKwNjP/MzM3AwuAs3q0OQu4pTK9EJgcEVFgTZKkHiIzi9lxxDnAaZl5cWX+fOD4zLy0W5tnK23aK/P/r9Lm9R77mgXMqsz+OfCrQooeHA4AXu+zlQYrP7+h64P+2R2cmc3VVgwr8KDVvtn3TJ1a2pCZ84H5/VHUYBcRbZnZWu86tHv8/IauMn92RZ4aagdGd5tvAV7prU1EDAM+BLxRYE2SpB6KDIKngEMjYmxE7AlMA+7t0eZe4MLK9DnAo1nUuSpJUlWFnRrKzC0RcSnwMNAA3JyZqyLiWqAtM+8FbgJ+GBFr6eoJTCuqniGkFKfAPsD8/Iau0n52hV0sliQNDT5ZLEklZxBIUskZBINERNwcEa9Vnq3QEBIRoyNiUUSsiYhVEXF5vWtS7SKiKSKWRsQzlc9vbr1rGmheIxgkIuLjwEbg1sw8ot71qHYRcRBwUGYuj4h9gGXApzNzdZ1LUw0qoxn8SWZujIhG4HHg8sx8ss6lDRh7BINEZj6Gz1AMSZm5LjOXV6bfAtYAo+pblWqVXTZWZhsrr1J9QzYIpH5UGUH3GGBJfSvRroiIhohYAbwGPJKZpfr8DAKpn0TE3sBdwBcz881616PaZWZnZk6gawSEiRFRqtOzBoHUDyrnlu8Cbs/Mu+tdj3ZPZv4XsBg4rc6lDCiDQHqfKhcbbwLWZOa36l2Pdk1ENEfEfpXpvYBTgefqW9XAMggGiYi4A3gC+POIaI+Ii+pdk2r2MeB84JSIWFF5Tal3UarZQcCiiFhJ1xhpj2Tm/XWuaUB5+6gklZw9AkkqOYNAkkrOIJCkkjMIJKnkDAJJKjmDQKUSEZ3dbvFcERFXVWkzKSL69fbByj5P7Db/dxFxQX8eQ9pdhf1UpTRIvVMZSmCgTaJrdNn/AMjMG+tQg1SVPQIJiIjTIuK5iHgc+Otuy+dExJXd5p+tDCxHRFwQESsr49j/sLLszIhYEhFPR8RPIuJPK+3/DvhflV7Iyd33GxETIuLJyr7uiYj/Vlm+OCK+Xhkr/9cRcfIAvR0qGYNAZbNXj1NDn4mIJuBfgDOBk4ED+9pJRBwO/CNwSmYeDWz9MZrHgRMy8xhgAfClzPwNcCNwXWZOyMyf99jdrcDfZ+ZRwC+Br3RbNywzJwJf7LFc6jeeGlLZ7HBqKCImAC9k5vOV+duAWX3s5xRgYWa+DpCZW39LogW4s/JjNXsCL+xsJxHxIWC/zPxZZdEtwI+6Ndk6gN0yYEwfNUm7xR6B1KW3sVa2sP1/J02Vv9HLNt8F5mXmkcDnurXfXX+o/O3EL24qiEEgdY00OTYi/ntl/rxu634DHAsQEccCYyvLfwr8z4gYUVm3f2X5h4CXK9MXdtvPW8A+PQ+cmRuA33U7/38+8LOe7aQiGQQqm57XCP4pMzfRdSro3ysXi1/s1v4uYP/Kr1d9Hvg1QGauAv438LOIeAbYOvz0HOBHEfFz4PVu+7kP+KutF4t71HQh8M3K6JcTgGv78x8s9cXRRyWp5OwRSFLJGQSSVHIGgSSVnEEgSSVnEEhSyRkEklRyBoEkldz/B62DVViZ8BqtAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Relation between Family& Education to personal load\n", "sns.barplot('Education','Family',hue='Personal Loan',data=df_original,ci=None)\n" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "# Classes for all model goes here\n", "# 1. Class Logistic\n", "# 2. Class Knn\n", "# 3. Class NaiveBayes\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Analysing Experience column for unique value [1, 19, 15, 9, 8, 13, 27, 24, 10, 39, 5, 23, 32, 41, 30, 14, 18, 21, 28, 31, 11, 16, 20, 35, 6, 25, 7, 12, 26, 37, 17, 2, 36, 29, 3, 22, 0, 34, 38, 40, 33, 4, 42, 43]\n", "Experience values unique True\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/ashish/installed_apps/anaconda3/lib/python3.7/site-packages/pandas/core/indexing.py:966: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " self.obj[item] = s\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
Age5000.045.33840011.46316623.035.045.055.067.0
Experience5000.020.11960011.4404840.010.020.030.043.0
Income5000.073.77420046.0337298.039.064.098.0224.0
ZIP Code5000.093152.5030002121.8521979307.091911.093437.094608.096651.0
Family5000.02.3964001.1476631.01.02.03.04.0
CCAvg5000.01.9379381.7476590.00.71.52.510.0
Education5000.01.8810000.8398691.01.02.03.03.0
Mortgage5000.056.498800101.7138020.00.00.0101.0635.0
Securities Account5000.00.1044000.3058090.00.00.00.01.0
CD Account5000.00.0604000.2382500.00.00.00.01.0
Online5000.00.5968000.4905890.00.01.01.01.0
CreditCard5000.00.2940000.4556370.00.00.01.01.0
\n", "
" ], "text/plain": [ " count mean std min 25% \\\n", "Age 5000.0 45.338400 11.463166 23.0 35.0 \n", "Experience 5000.0 20.119600 11.440484 0.0 10.0 \n", "Income 5000.0 73.774200 46.033729 8.0 39.0 \n", "ZIP Code 5000.0 93152.503000 2121.852197 9307.0 91911.0 \n", "Family 5000.0 2.396400 1.147663 1.0 1.0 \n", "CCAvg 5000.0 1.937938 1.747659 0.0 0.7 \n", "Education 5000.0 1.881000 0.839869 1.0 1.0 \n", "Mortgage 5000.0 56.498800 101.713802 0.0 0.0 \n", "Securities Account 5000.0 0.104400 0.305809 0.0 0.0 \n", "CD Account 5000.0 0.060400 0.238250 0.0 0.0 \n", "Online 5000.0 0.596800 0.490589 0.0 0.0 \n", "CreditCard 5000.0 0.294000 0.455637 0.0 0.0 \n", "\n", " 50% 75% max \n", "Age 45.0 55.0 67.0 \n", "Experience 20.0 30.0 43.0 \n", "Income 64.0 98.0 224.0 \n", "ZIP Code 93437.0 94608.0 96651.0 \n", "Family 2.0 3.0 4.0 \n", "CCAvg 1.5 2.5 10.0 \n", "Education 2.0 3.0 3.0 \n", "Mortgage 0.0 101.0 635.0 \n", "Securities Account 0.0 0.0 1.0 \n", "CD Account 0.0 0.0 1.0 \n", "Online 1.0 1.0 1.0 \n", "CreditCard 0.0 1.0 1.0 " ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Remember df_main_x will be our main dataframe to be operted on df_original is the unmodified loaded data set which is pure :)\n", "# seperate data i.e input columns and to be predicted column\n", "# Drop 'Personal Loan' from dataframe as this is dependent variable and copy it alone in y dataframe\n", "\n", "df_main_x = df_original[['Age', 'Experience', 'Income', 'ZIP Code', 'Family', 'CCAvg', \n", " 'Education', 'Mortgage', 'Securities Account', 'CD Account', 'Online', \n", " 'CreditCard']]\n", "\n", "# Replace all -ve values in experience column to 0\n", "\n", "# df_main_x.Experience[df_main_x.Experience.lt(0)] = 0 \n", "\n", "# df_main_x['Experience'] = df_main_x['Experience'].map(lambda value: value if value >=0 else 0)\n", "\n", "df_main_x.loc[df_main_x['Experience']<0, 'Experience']=0 \n", "\n", "\n", "print(\"Experience values unique {}\".format(validate_column(df_main_x['Experience'].unique().tolist(), 'Experience')))\n", "\n", "df_main_y = df_original['Personal Loan']\n", "\n", "\n", "# Also remember how we removed ID column as it is just a row counter\n", "\n", "df_main_x.describe().T\n", "\n", "# Now e see that there is no -ve value in experience columns\n" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeExperienceIncomeZIP CodeFamilyCCAvgEducationMortgageSecurities AccountCD AccountOnlineCreditCard
0251499110741.6101000
14519349008931.5101000
23915119472011.0100000
33591009411212.7200000
4358459133041.0200001
\n", "
" ], "text/plain": [ " Age Experience Income ZIP Code Family CCAvg Education Mortgage \\\n", "0 25 1 49 91107 4 1.6 1 0 \n", "1 45 19 34 90089 3 1.5 1 0 \n", "2 39 15 11 94720 1 1.0 1 0 \n", "3 35 9 100 94112 1 2.7 2 0 \n", "4 35 8 45 91330 4 1.0 2 0 \n", "\n", " Securities Account CD Account Online CreditCard \n", "0 1 0 0 0 \n", "1 1 0 0 0 \n", "2 0 0 0 0 \n", "3 0 0 0 0 \n", "4 0 0 0 1 " ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# X data frame\n", "df_main_x.head()" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0\n", "1 0\n", "2 0\n", "3 0\n", "4 0\n", "Name: Personal Loan, dtype: int64" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Y Data Frame\n", "df_main_y.head()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "# Training constants and general imports\n", "\n", "from math import sqrt\n", "\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.naive_bayes import GaussianNB\n", "from sklearn.neighbors import KNeighborsClassifier\n", "\n", "from sklearn import metrics\n", "from sklearn.metrics import classification_report\n", "\n", "# taking 70:30 training and test set\n", "test_size = 0.30 \n", "\n", "# Random number seeding for reapeatability of the code\n", "seed = 29 # My BirthDate :)\n", "\n", "def isqrt(n):\n", " x = n\n", " y = (x + 1) // 2\n", " while y < x:\n", " x = y\n", " y = (x + n // x) // 2\n", " return x" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Logistic Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training General" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "# Why are we doing Logistic Regression, because in linear regression response of the system in continuous, where as in \n", "# logistic regression it is just limited number of possible outcomes i.e in our case [0] or [1] which is whethere a person \n", "# is likely to take loan or not [yes] or [no]\n", "\n", "# Class LogisticRegressionProcess\n", "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(df_main_x, df_main_y, test_size=test_size, random_state=seed) \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predicting Logistic Regression" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "\n", "lr_model = LogisticRegression()\n", "\n", "lr_model.fit(X_train, y_train)\n", "\n", "lr_predict = lr_model.predict(X_test)\n", "\n", "lr_score = lr_model.score(X_test, y_test)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluating Logistic Regression" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model Score\n", "0.9053333333333333\n", "Model confusion matrix\n", "[[1314 49]\n", " [ 93 44]]\n", " precision recall f1-score support\n", "\n", " 0 0.93 0.96 0.95 1363\n", " 1 0.47 0.32 0.38 137\n", "\n", " accuracy 0.91 1500\n", " macro avg 0.70 0.64 0.67 1500\n", "weighted avg 0.89 0.91 0.90 1500\n", "\n" ] } ], "source": [ "print(\"Model Score\")\n", "print(lr_score)\n", "print(\"Model confusion matrix\")\n", "print(metrics.confusion_matrix(y_test, lr_predict))\n", "\n", "\n", "print(classification_report(y_test,lr_predict))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Naive Bayes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predicting Naive Bayes" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "nb_model = GaussianNB()\n", "\n", "nb_model.fit(X_train, y_train)\n", "\n", "y_nb_predict = nb_model.predict(X_test)\n", "\n", "nb_score = nb_model.score(X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluating Naive Bayes" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model Score\n", "0.876\n", "Model confusion matrix\n", "[[1240 123]\n", " [ 63 74]]\n", " precision recall f1-score support\n", "\n", " 0 0.95 0.91 0.93 1363\n", " 1 0.38 0.54 0.44 137\n", "\n", " accuracy 0.88 1500\n", " macro avg 0.66 0.72 0.69 1500\n", "weighted avg 0.90 0.88 0.89 1500\n", "\n" ] } ], "source": [ "print(\"Model Score\")\n", "print(nb_score)\n", "print(\"Model confusion matrix\")\n", "print(metrics.confusion_matrix(y_test, y_nb_predict))\n", "\n", "\n", "print(classification_report(y_test,y_nb_predict))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# K-NN" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predicting K-NN" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "70\n", "Knn evaluation completed, best value is 28\n" ] } ], "source": [ "knn_predict = 0\n", "knn_score = 0\n", "knn_value = 0\n", "# We have total 5000 taging \n", "print(isqrt(df_main_x.shape[0]))\n", "for i in range(isqrt(df_main_x.shape[0])):\n", " kvalue = i+1\n", " knn_model = KNeighborsClassifier(n_neighbors=kvalue)\n", " knn_model.fit(X_train, y_train)\n", " new_knn_predict = knn_model.predict(X_test)\n", " new_knn_score = knn_model.score(X_test, y_test)\n", " if new_knn_score >= knn_score:\n", " knn_score = new_knn_score\n", " knn_predict = new_knn_predict\n", " knn_value = kvalue\n", "\n", "print(\"Knn evaluation completed, best value is {}\".format(knn_value))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluating K-NN" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model Score\n", "0.9106666666666666\n", "Model confusion matrix\n", "[[1360 3]\n", " [ 131 6]]\n", " precision recall f1-score support\n", "\n", " 0 0.91 1.00 0.95 1363\n", " 1 0.67 0.04 0.08 137\n", "\n", " accuracy 0.91 1500\n", " macro avg 0.79 0.52 0.52 1500\n", "weighted avg 0.89 0.91 0.87 1500\n", "\n" ] } ], "source": [ "print(\"Model Score\")\n", "print(knn_score)\n", "print(\"Model confusion matrix\")\n", "print(metrics.confusion_matrix(y_test, knn_predict))\n", "\n", "\n", "print(classification_report(y_test,knn_predict))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis Result" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model score are \n", "{'Logistic Regression': 0.9053333333333333, 'Naive Bayes': 0.876, 'K-NN': 0.9106666666666666}\n", "Best score is for K-NN with accuracy 0.9106666666666666 \n", " with kvalue 70\n" ] } ], "source": [ "\n", "\n", "results = {'Logistic Regression': lr_score, 'Naive Bayes': nb_score, 'K-NN': knn_score}\n", "\n", "print(\"Model score are \")\n", "print(results)\n", "\n", "best_score = max(results, key=results.get);\n", "\n", "print(\"Best score is for {} with accuracy {} \".format(best_score, results[best_score]))\n", "\n", "if best_score == 'K-NN':\n", " print(' with kvalue {}'.format(kvalue))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis Report" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our main aim was to find out people who would accept personal loan based on given data.\n", "\n", "From the output we see that K-NN turned out to be the best model with accuracy of 0.91. The other nearest accuracy is 0.90 which is of Logistic Regression. For our use case by identifying the problem state and given the option we had we can consider Logistic Regression to be the best approach, as output to be predicted was 0/1 and that is what logistic regression does, it transforms its output using sigma function. Also,, Logis Regression is parameteric dependent algorithm where as K-NN is not. Theoretically K-NN is a little slower, as we have also seen that to find the best possible k value we had to iterate, this can impact as in our case it is dependent on input data size(row count).\n", "\n", "So K-NN Would perform when there is no dependency on time constaints for finding out the best score, where as Logistic Regression would be the optimal choice for time constraints and when the target column is of binary predection asin this case." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 2 }