[{"content":"","date":"13 April 2026","externalUrl":null,"permalink":"/","section":"","summary":"","title":"","type":"page"},{"content":"","date":"13 April 2026","externalUrl":null,"permalink":"/series/empirical-economics/","section":"Series","summary":"","title":"Empirical Economics","type":"series"},{"content":" Ontario Education \u0026amp; OSAP # This notebook aims to quantify returns to post-secondary education (covered under OSAP), analyse OSAP\u0026rsquo;s latest policies, and see if the one matches the other - if getting tertiary education is worth it given OSAP financing arrangements, especially loan burdens.\nFor this notebook, I will be using StatCan\u0026rsquo;s Census 2021 PUMF Individual data. Since we don\u0026rsquo;t want to analyse structures (i.e. families), we don\u0026rsquo;t need hierarchical data.\nTo quantify and analyse the returns to post-secondary education, we need a few things. First, we need a metric to compare against - the median income of someone who won\u0026rsquo;t get post-secondary education. A counterfactual would be ideal - the sort of person who gets post-secondary education is not guaranteed to earn the same in, say, the trades as a tradesman. They may earn more, or they may earn less. Instead of comparing Person A who gets post-secondary education to Person B who doesn\u0026rsquo;t, a detailed analysis would compare Person A who gets post-secondary education to Person A who doesn\u0026rsquo;t. But constructing such a model seems complex, so we will simply compare median earnings.\nOnce we\u0026rsquo;ve constructed that model, we need to quantify the returns to post-secondary education. Median wages vary by degree, field, and experience, even gender, so we will need to control or analyse these variables separately.\nFinally, we\u0026rsquo;ll need to do some accounting calculations: given OSAP\u0026rsquo;s financing burden, does an investment in education (given the above models) make sense? In technical terms, I\u0026rsquo;m going to calculate the NPV (value of investment) of the earnings premium (of education) net (minus) of loan repayment costs, and maybe compared across grant/loan ratios.\nimport pandas as pd import matplotlib.pyplot as plt import numpy as np import statsmodels.api as sm import seaborn as sns Pre-processing # Census 2021\u0026rsquo;s PUMF data is massive. I will filter the file prior to working with it to minimise space-used. It is also complex, primarily using codes for characteristics and values. I will handle this all early.\nThe characteristics I\u0026rsquo;m filtering for:\nPPSORT - Unique ID AGEGRP - AGE CIP2021 - Fields of Study EmpIn - Income: Employment income Gender - Gender HDGREE - Education: Highest certificate, diploma or degree LFACT - Labour: Labour force status - Detailed PR - Province # Filter for variables and load csv cols = [\u0026#39;PPSORT\u0026#39;, \u0026#39;AGEGRP\u0026#39;, \u0026#39;CIP2021\u0026#39;, \u0026#39;EmpIn\u0026#39;, \u0026#39;Gender\u0026#39;, \u0026#39;SSGRAD\u0026#39;, \u0026#39;LFACT\u0026#39;, \u0026#39;PR\u0026#39;] census_2021 = pd.read_csv(\u0026#39;census_2021_ontario.csv\u0026#39;, usecols=cols) # Filter for Ontario only census_2021 = census_2021[census_2021[\u0026#39;PR\u0026#39;] == 35] # Remove NA census_2021 = census_2021[census_2021[\u0026#39;EmpIn\u0026#39;] != 99999999] census_2021 = census_2021[census_2021[\u0026#39;EmpIn\u0026#39;] != 88888888] census_2021 = census_2021[~census_2021[\u0026#39;SSGRAD\u0026#39;].isin([88, 99])] census_2021 = census_2021[~census_2021[\u0026#39;AGEGRP\u0026#39;].isin([88])] census_2021 = census_2021[~census_2021[\u0026#39;CIP2021\u0026#39;].isin([88, 99])] census_2021 = census_2021[~census_2021[\u0026#39;LFACT\u0026#39;].isin([88, 99])] # Filtering for working age census_2021 = census_2021[census_2021[\u0026#39;AGEGRP\u0026#39;].isin(range(8,17))] # Filter for positive income census_2021 = census_2021[census_2021[\u0026#39;EmpIn\u0026#39;] \u0026gt;= 0] census_2021.head() PPSORT AGEGRP CIP2021 EmpIn Gender LFACT PR SSGRAD 0 2 11 8 12000 1 3 35 6 1 8 16 4 61000 1 13 35 11 3 10 12 13 25000 2 1 35 4 4 12 13 5 130000 1 1 35 8 8 21 10 5 63000 1 1 35 11 SSGRAD = { 1: \u0026#39;No certificate\u0026#39;, 2: \u0026#39;Trades (no HS)\u0026#39;, 3: \u0026#39;College/CEGEP (no HS)\u0026#39;, 4: \u0026#39;HS\u0026#39;, 5: \u0026#39;Trades\u0026#39;, 6: \u0026#39;College/CEGEP\u0026#39;, 7: \u0026#39;University below bachelor\u0026#39;, 8: \u0026#39;Bachelor\u0026#39;, 9: \u0026#39;University above bachelor\u0026#39;, 10: \u0026#39;Medicine/Dentistry/Vet/Optometry\u0026#39;, 11: \u0026#39;Masters\u0026#39;, 12: \u0026#39;Doctorate\u0026#39;, 88: None, # Not available 99: None # Not applicable } GENDER = { 1: \u0026#34;Woman\u0026#34;, 2: \u0026#34;Man\u0026#34; } AGEGRP = { 4: \u0026#34;10-11\u0026#34;, 19: \u0026#34;75-79\u0026#34;, 16: \u0026#34;60-64\u0026#34;, 10: \u0026#34;30-34\u0026#34;, 11: \u0026#34;35-39\u0026#34;, 2: \u0026#34;5-6\u0026#34;, 17: \u0026#34;65-69\u0026#34;, 3: \u0026#34;7-9\u0026#34;, 1: \u0026#34;0-4\u0026#34;, 20: \u0026#34;80-84\u0026#34;, 12: \u0026#34;40-44\u0026#34;, 8: \u0026#34;20-24\u0026#34;, 6: \u0026#34;15-17\u0026#34;, 7: \u0026#34;18-19\u0026#34;, 18: \u0026#34;70-74\u0026#34;, 13: \u0026#34;45-49\u0026#34;, 88: None, 14: \u0026#34;50-54\u0026#34;, 9: \u0026#34;25-29\u0026#34;, 21: \u0026#34;85+\u0026#34;, 5: \u0026#34;12-14\u0026#34;, 15: \u0026#34;55-59\u0026#34;, } CIP2021 = { 11: \u0026#34;Personal/protective/transport services\u0026#34;, 8: \u0026#34;Architecture/engineering/trades\u0026#34;, 99: None, 3: \u0026#34;Humanities\u0026#34;, 10: \u0026#34;Health\u0026#34;, 7: \u0026#34;Math/CS/info\u0026#34;, 88: None, 4: \u0026#34;Social sciences \u0026amp; law\u0026#34;, 9: \u0026#34;Agriculture/natural resources/conservation\u0026#34;, 5: \u0026#34;Business/management/public admin\u0026#34;, 1: \u0026#34;Education\u0026#34;, 13: \u0026#34;No postsecondary degree\u0026#34;, 12: \u0026#34;Other\u0026#34;, 2: \u0026#34;Visual/performing arts \u0026amp; comm\u0026#34;, 6: \u0026#34;Physical/life sciences \u0026amp; tech\u0026#34;, } LFACT = { 1: \u0026#39;Employed\u0026#39;, 2: \u0026#39;Employed - Absent\u0026#39;, 3: \u0026#39;Unemployed\u0026#39;, 4: \u0026#39;Unemployed\u0026#39;, 5: \u0026#39;Unemployed\u0026#39;, 6: \u0026#39;Unemployed\u0026#39;, 7: \u0026#39;Unemployed\u0026#39;, 8: \u0026#39;Unemployed\u0026#39;, 9: \u0026#39;Unemployed\u0026#39;, 10: \u0026#39;Unemployed\u0026#39;, 11: \u0026#39;Not in labour force\u0026#39;, 12: \u0026#39;Not in labour force\u0026#39;, 13: \u0026#39;Not in labour force\u0026#39;, 14: \u0026#39;Not in labour force\u0026#39;, 88: None, 99: None } Constructing the Alternate # First, we will find the median income of the average non-post secondary educated person. This includes those with a high school education, tradesmen, career college diplomas, and so on.\nnon_ps_df = census_2021[census_2021[\u0026#39;SSGRAD\u0026#39;].isin([1, 2, 4, 5])] non_ps_df[\u0026#39;EmpIn\u0026#39;].median() np.float64(33000.0) The median non-post secondary educated person in Ontario earns $33,000 per year. This figure accounts for the unemployed, and not for those with negative income.\nnon_ps_df[non_ps_df[\u0026#39;LFACT\u0026#39;].isin([1, 2])][\u0026#39;EmpIn\u0026#39;].median() np.float64(42000.0) Filtering for the employed only, that number increases to $42000 per year.\nReturns to Education # The Mincer earnings function, given wages (their log, to be specific), years of experience, and years of education (alongside other independent variables), models an individuals earnings as a function of these variables, allowing us to calculate the value of, for instance, an additional year of schooling. It\u0026rsquo;s identical to what I did in the last post, with two distinctions - first, we\u0026rsquo;re tracking multiple independent variables (schooling, exp, gender); second, we\u0026rsquo;re using the log of wages. This is because we want the percentage increase of our coefficients, not their absolute increase.\nThere are some limitations, specific to my data and general to the model. I only have age groups, so I will have to use midpoints - this affects years of experience. Furthermore, my formula for years of experience is simply: age - (years of schooling + 6). This does not account for unemployment and other such things.\nYears of education are guessed for some groups with non-post-secondary education - it is impossible to get an exact year for, say, someone who has neither an HS diploma or degree.\nCode # mincer_df = census_2021.copy() # Map degree to years of schooling edu_years = { 1: 10, # No certificate — assume some high school 2: 11, # Trades no HS — HS dropout + trades 3: 13, # College no HS — assume completed college 4: 12, # HS diploma only 5: 13, # Trades + HS — 12 + ~1 year trades 6: 14, # College/CEGEP — 12 + 2 7: 14, # University below bachelor — 12 + 2 8: 16, # Bachelor — 12 + 4 9: 17, # University above bachelor — 12 + 5 10: 18, # Medicine/Dentistry/Vet — 12 + 4 + 2 specialty minimum 11: 18, # Masters — 12 + 4 + 2 12: 21, # Doctorate — 12 + 4 + 5 } age_midpoint = { 8: 22, # 20-24 9: 27, # 25-29 10: 32, # 30-34 11: 37, # 35-39 12: 42, # 40-44 13: 47, # 45-49 14: 52, # 50-54 15: 57, # 55-59 16: 62, # 60-64 } mincer_df[\u0026#39;log_wage\u0026#39;] = np.log(mincer_df[\u0026#39;EmpIn\u0026#39;]) mincer_df[\u0026#39;age_mid\u0026#39;] = mincer_df[\u0026#39;AGEGRP\u0026#39;].map(age_midpoint) mincer_df[\u0026#39;edu_years\u0026#39;] = mincer_df[\u0026#39;SSGRAD\u0026#39;].map(edu_years) mincer_df[\u0026#39;exp_years\u0026#39;] = mincer_df[\u0026#39;age_mid\u0026#39;] - (mincer_df[\u0026#39;edu_years\u0026#39;] + 6) mincer_df[\u0026#39;exp_years\u0026#39;] = mincer_df[\u0026#39;exp_years\u0026#39;].clip(lower=0) mincer_df PPSORT AGEGRP CIP2021 EmpIn Gender LFACT PR SSGRAD log_wage age_mid edu_years exp_years 0 2 11 8 12000 1 3 35 6 9.392662 37 14 17 1 8 16 4 61000 1 13 35 11 11.018629 62 18 38 3 10 12 13 25000 2 1 35 4 10.126631 42 12 24 4 12 13 5 130000 1 1 35 8 11.775290 47 16 25 8 21 10 5 63000 1 1 35 11 11.050890 32 18 8 ... ... ... ... ... ... ... ... ... ... ... ... ... 378842 980851 11 8 180000 1 1 35 12 12.100712 37 21 10 378843 980852 12 13 49000 2 1 35 1 10.799576 42 10 26 378844 980856 12 10 110000 1 1 35 6 11.608236 42 14 22 378846 980862 16 13 33000 1 1 35 4 10.404263 62 12 44 378848 980866 10 5 130000 2 1 35 6 11.775290 32 14 12 182864 rows × 12 columns\nmodel = sm.OLS.from_formula(\u0026#39;log_wage ~ edu_years + exp_years + I(exp_years**2) + C(Gender)\u0026#39;, data=mincer_df) results = model.fit() results.summary() OLS Regression Results Dep. Variable: log_wage R-squared: 0.102 Model: OLS Adj. R-squared: 0.102 Method: Least Squares F-statistic: 5217. Date: Mon, 13 Apr 2026 Prob (F-statistic): 0.00 Time: 17:07:21 Log-Likelihood: -3.3964e+05 No. Observations: 182864 AIC: 6.793e+05 Df Residuals: 182859 BIC: 6.793e+05 Df Model: 4 Covariance Type: nonrobust coef std err t P\u003e|t| [0.025 0.975] Intercept 7.0040 0.027 264.176 0.000 6.952 7.056 C(Gender)[T.2] 0.3884 0.007 53.291 0.000 0.374 0.403 edu_years 0.1497 0.002 90.663 0.000 0.146 0.153 exp_years 0.1044 0.001 94.200 0.000 0.102 0.107 I(exp_years ** 2) -0.0020 2.43e-05 -80.547 0.000 -0.002 -0.002 Omnibus: 142020.638 Durbin-Watson: 1.997 Prob(Omnibus): 0.000 Jarque-Bera (JB): 3451730.600 Skew: -3.609 Prob(JB): 0.00 Kurtosis: 23.023 Cond. No. 6.39e+03 Notes:[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.[2] The condition number is large, 6.39e+03. This might indicate that there arestrong multicollinearity or other numerical problems. Since we are using ln(wage) (we have a log-level model), we need to adjust the coefficients to get proper slopes.\nparams = results.params adjusted_params = (np.exp(params) - 1) * 100 adjusted_params Intercept 110006.516586 C(Gender)[T.2] 47.455093 edu_years 16.143483 exp_years 10.999062 I(exp_years ** 2) -0.195553 dtype: float64 Interpretation # We have three independent variables we\u0026rsquo;re tracking: years of schooling, experience, and gender. Our R-squared is equal to 0.102, which means that 10% of the variation in employment income is explained by these factors. All these variables are statistically significant (P\u0026gt;|t| \u0026lt; 0.05).\nOur baseline is a male with 0 years of education and experience. An additional year of schooling (for such a person) gives us a 16.1% increase in wages. An additional year of experience gives us 11% more wages, but with a diminishing return of 0.2pp each year.\nOur data also shows that men earn 47.46% more than women.\nImportantly, we can now model incomes based on a hypothetical person\u0026rsquo;s education, experience, and gender.\nbaseline = results.predict({\u0026#39;edu_years\u0026#39;: 12, \u0026#39;exp_years\u0026#39;: 2, \u0026#39;Gender\u0026#39;: 2}) # HS degree, 2 years of experience, male np.exp(baseline[0]) np.float64(11957.7402224044) The Costs of Education # The purpose of this article is to see how OSAP policy changes affect an individual\u0026rsquo;s choice of education. Starting from the year 2026-27, OSAP has tweaked its grant:loans ratio to 25:75, at a maximum. The exact ratio may be 0:100, depending on an individual\u0026rsquo;s financial background. We will be testing variations up to the maximum.\nOne question is how much money our model should expect from OSAP. Again, this depends on an individual\u0026rsquo;s background. Some individuals can expect a near 100% cover, while others will receive nothing. Moreover, the ratio and level of financing are tightly linked: an individual who receives a 100% cover will likely also get the maximum 25:75 ratio because both factors are determined by financial background.\nGenerally, the costs of post-secondary education are many. Our model covers experience - every year spent in post-secondary is one less year of experience compared to the person who skipped it. Some costs are deceptive - we could include rent and other living expenses, but these are independent of post-secondary education.\nSimple NPV of Education # To cut through the fog, we will start with a simple calculation. We will ignore OSAP and other factors, and simply see if, ceteris paribus, education is a worthwhile investment using the Net Present Value formula. The NPV formula is expressed as such:\nNPV = sum(net_cash_flow / (1 + discount_rate)^time) - costs\nOur costs will be the costs of education (tuition, fees) and the foregone wages (opportunity cost). Ontariocolleges.ca provides a number: 6100 (tuition) + 800 (fees) + 1300 (books and supplies for a total $8200 per academic year. I will assume a bachelors degree. Note that this is an average, exact costs will depend on institution. It also assumes Ontario, despite OSAP covering out-of-province study in certain cases. Our discount rate will be the rate of interest. Our net cash flow will be earnings post-degree. Time will be a working lifetime. Gender will be male (for our example).\nThis can be quite confusing. It serves to express this in tabular form once to get the general gist of how it works. First, we need to get our cash flows down.\npost_sec = [] post_sec[0:3] = [-8200] * 4 # adding college costs # getting the rest of the cash flow of our post-sec model for exp in range(22,66): post_sec.append(np.exp(results.predict({\u0026#39;edu_years\u0026#39;: 16, \u0026#39;exp_years\u0026#39;: exp-22, \u0026#39;Gender\u0026#39;: 2})[0])) hs = [] # same but for our hs model for exp in range(18,66): hs.append(np.exp(results.predict({\u0026#39;edu_years\u0026#39;: 12, \u0026#39;exp_years\u0026#39;: exp-18, \u0026#39;Gender\u0026#39;: 2})[0])) data = [post_sec, hs] cash_flow = pd.DataFrame(data=list(zip(*data)), columns=[\u0026#39;post_sec\u0026#39;, \u0026#39;hs\u0026#39;], index=range(0, 48)) cash_flow post_sec hs 0 -8200.000000 9781.623220 1 -8200.000000 10836.277815 2 -8200.000000 11957.740222 3 -8200.000000 13143.707352 4 17798.782943 14390.849470 5 19717.847682 15694.763037 6 21758.476883 17049.940925 7 23916.479808 18449.762380 8 26185.797625 19886.504769 9 28558.417591 21351.378687 10 31024.318856 22834.587510 11 33571.454203 24325.411815 12 36185.771412 25812.318479 13 38851.277159 27283.093505 14 41550.145363 28724.996933 15 44262.870809 30124.937429 16 46968.467659 31469.663498 17 49644.711148 32745.967587 18 52268.419458 33940.898824 19 54815.771412 35041.979698 20 57262.654397 36037.421651 21 59585.035758 36916.334435 22 61759.349903 37668.924029 23 63762.892569 38286.674120 24 65574.213131 38762.506430 25 67173.495531 39090.915696 26 68542.918431 39268.075679 27 69666.985421 39291.913367 28 70532.816768 39162.149351 29 71130.395013 38880.303273 30 71452.757879 38449.664208 31 71496.133281 37875.226776 32 71260.012801 37163.594721 33 70747.161605 36322.854566 34 69963.564541 35362.422716 35 68918.309890 34292.870055 36 67623.413921 33125.728596 37 66093.590987 31873.285126 38 64345.975315 30548.366986 39 62399.801840 29164.125185 40 60276.054377 27733.819916 41 57997.090143 26270.613299 42 55586.249953 24787.373747 43 53067.463571 23296.495863 44 50464.859437 21809.739138 45 47802.387535 20338.088067 46 45103.463416 18891.635562 47 42390.640477 17479.490823 # crunching the numbers cash_flow[\u0026#39;net_ps\u0026#39;] = cash_flow[\u0026#39;post_sec\u0026#39;] / (1 + 0.03) ** cash_flow.index cash_flow[\u0026#39;net_hs\u0026#39;]= cash_flow[\u0026#39;hs\u0026#39;] / (1 + 0.03) ** cash_flow.index male_npv = cash_flow[\u0026#39;net_ps\u0026#39;].sum() - cash_flow[\u0026#39;net_hs\u0026#39;].sum() male_npv np.float64(356221.5858905276) The data seems to tell us that college education is a great investment. A NPV of $356,222, specifically.\nThere\u0026rsquo;s a few complications here, though. Of course, financing the initial investment is difficult, which is why we have to bring OSAP into our calculations. This doesn\u0026rsquo;t present the whole story.\nThe main specific issue is unemployment - my numbers account for unemployment, so the unemployed are dragging the wages down (hence the somewhat odd, below minimum wage, numbers for initial years). Since we\u0026rsquo;re doing this for both HS and PS models, it cancels out somewhat. Employment chances are pretty important for decisions such as these, so accounting for unemployment seems sensible, but it does make our figures somewhat less precise.\nMoreover, we\u0026rsquo;re assuming a Bachelor\u0026rsquo;s degree, and not accounting for field of study. For our non-post-sec model, we\u0026rsquo;re only assuming a high-school degree when trades are also an option. These variables will be covered later on.\nFor clarity\u0026rsquo;s sake, I\u0026rsquo;ll recrunch the numbers for a female model too.\npost_sec_fem = [] post_sec_fem[0:3] = [-8200] * 4 for exp in range(22,66): post_sec_fem.append(np.exp(results.predict({\u0026#39;edu_years\u0026#39;: 16, \u0026#39;exp_years\u0026#39;: exp-22, \u0026#39;Gender\u0026#39;: 1})[0])) hs_fem = [] for exp in range(18,66): hs_fem.append(np.exp(results.predict({\u0026#39;edu_years\u0026#39;: 12, \u0026#39;exp_years\u0026#39;: exp-18, \u0026#39;Gender\u0026#39;: 1})[0])) data = [post_sec_fem, hs_fem] cash_flow_fem = pd.DataFrame(data=list(zip(*data)), columns=[\u0026#39;post_sec\u0026#39;, \u0026#39;hs\u0026#39;], index=range(0, 48)) cash_flow_fem post_sec hs 0 -8200.000000 6633.628607 1 -8200.000000 7348.866430 2 -8200.000000 8109.411478 3 -8200.000000 8913.701861 4 12070.646461 9759.479442 5 13372.103537 10643.757863 6 14756.002297 11562.802341 7 16219.500697 12512.122862 8 17758.489805 13486.482148 9 19367.535597 14479.919465 10 21039.842208 15485.791012 11 22767.239544 16496.827169 12 24540.197778 17505.206490 13 26347.870675 18502.645778 14 28178.169074 19480.505138 15 30017.865075 20429.906389 16 31852.726658 21341.862730 17 33667.681597 22207.417160 18 35447.008618 23017.786755 19 37174.552852 23764.509606 20 38833.961786 24439.591039 21 40408.937134 25035.645585 22 41883.497357 25546.031208 23 43242.245046 25964.972378 24 44470.632975 26287.668801 25 45555.222439 26510.386958 26 46483.927494 26630.532000 27 47246.238899 26646.698067 28 47833.421973 26558.695669 29 48238.683150 26367.555389 30 48457.300804 26075.507786 31 48486.716813 25685.940073 32 48326.586379 25203.330730 33 47978.784763 24633.163814 34 47447.370724 23981.825273 35 46738.507686 23256.483991 36 45860.344752 22464.960662 37 44822.860793 21615.587849 38 43637.675773 20717.064702 39 42317.834296 19778.309875 40 40877.567011 18808.316068 41 39332.035967 17816.009469 42 37697.070267 16810.117083 43 35988.898421 15799.044589 44 34223.883674 14790.766952 45 32418.268248 13792.733556 46 30587.931928 12811.789136 47 28748.169810 11854.111302 # crunching the numbers cash_flow_fem[\u0026#39;net_ps\u0026#39;] = cash_flow_fem[\u0026#39;post_sec\u0026#39;] / (1 + 0.03) ** cash_flow_fem.index cash_flow_fem[\u0026#39;net_hs\u0026#39;]= cash_flow_fem[\u0026#39;hs\u0026#39;] / (1 + 0.03) ** cash_flow_fem.index fem_npv = cash_flow_fem[\u0026#39;net_ps\u0026#39;].sum() - cash_flow_fem[\u0026#39;net_hs\u0026#39;].sum() fem_npv np.float64(231476.0626947455) For women, the NPV is $231,476. This is an interesting result - common-sense tells us that women benefit more from college than men because the sort of jobs that don\u0026rsquo;t require college degrees are less-suited for them (for whatever reason). This value seems to contradict that.\nHowever, this is an absolute number. It makes sense for the absolute number to be lower, because of the gender pay gap we quantified earlier in this notebook. Once we account for that, by modeling the relative ROI, what do the results look like?\nhs_male_pv = cash_flow[\u0026#39;net_hs\u0026#39;].sum() hs_fem_pv = cash_flow_fem[\u0026#39;net_hs\u0026#39;].sum() male_roi = (male_npv / hs_male_pv) * 100 fem_roi = (fem_npv / hs_fem_pv) * 100 print(f\u0026#34;Male ROI: {male_roi:.1f}%\u0026#34;) print(f\u0026#34;Female ROI: {fem_roi:.1f}%\u0026#34;) Male ROI: 52.2% Female ROI: 50.0% The results seem to be similar. This does not contradict the intuition, however. Our model fails to capture some considerations. For starters, our coefficient for edu_years already accounts for both female and male returns together. We\u0026rsquo;d have to calculate two seperate slopes, and then our result would be cleaner. The exact idea behind women and edcation is quite complex, and goes beyond just a simple wage model\u0026rsquo;s capabilities too. That\u0026rsquo;s for another day, though.\nAccounting for OSAP # Regardless, now that we have a base model, and have determined that education makes sense if you can afford initial costs, we can move onto accounting for OSAP. We will assume a 100% OSAP coverage, with variations in grants:loans ratio. Why? Well, if you have 0% OSAP coverage you can either not afford college or will pay using a RESP or some other means: for the former, this model is useless in any case, and in the latter case the answer is yes - our \u0026rsquo;naive\u0026rsquo; model says you should go to college.\nThe case of varying proportions of OSAP and misc coverage is more complex (i.e. 50% OSAP, 50% other sources). However, there is no need to cover that. This is because, if 100% OSAP (meaning a large loan burden) is still a positive investment, then a lower proportion of OSAP funds should generally be the same as well.\nOSAP, including loans, does cover non-tuition/fees/books expenses. I made the decision not to include these since these exist for HS too, and I won\u0026rsquo;t include them in this model either.\nOSAP loans need to be repayed 6 months after graduating (for simplicities\u0026rsquo; sake, I\u0026rsquo;ll put that as starting in the fifth year). The time horizon is 9.5 years. 70% of the debt is 0-interest federal, while 30% is provincial and has an interest rate of: prime-rate + 1%. The current prime-rate is 4.45%, but this varies. OSAP has a handy calculator that contains more details.\nOnce we are done with that, we will finally control for fields of study and exact degree variation.\ndef osap_loan(grant_ratio=0.25, total_osap=8200*4, interest_rate=0.06): \u0026#34;\u0026#34;\u0026#34;Given grant_ratio, loan, and interest rate, calculates and returns annuity\u0026#34;\u0026#34;\u0026#34; loan = total_osap * (1 - grant_ratio) loan_fed = loan * 0.75 loan_prov = loan * 0.25 annual_payment = loan_prov * (interest_rate * (1 + interest_rate)**10) / ((1 + interest_rate)**10 - 1) annual_payment += loan_fed / 10 return annual_payment osap_loan() 2680.58794305536 def osap_npv(ratio, main_edu=16, sec_edu=12, main_model=results, sec_model=results, bFos=False, fos=None): \u0026#34;\u0026#34;\u0026#34;Given a OLS model, education for main and counterfactual model, grant ratio, returns NPV for each gender\u0026#34;\u0026#34;\u0026#34; main_male = [0 for x in range(18,66)] sec_male = [0 for x in range(18,66)] main_female = [0 for x in range(18,66)] sec_female = [0 for x in range(18,66)] sec_range = sec_edu - 12 for exp in range(22,66): if bFos: main_male[exp - 18] = np.exp(main_model.predict({\u0026#39;edu_years\u0026#39;: main_edu, \u0026#39;exp_years\u0026#39;: exp-22, \u0026#39;Gender\u0026#39;: 2, \u0026#39;FOS\u0026#39;: fos})[0]) main_female[exp - 18] = np.exp(main_model.predict({\u0026#39;edu_years\u0026#39;: main_edu, \u0026#39;exp_years\u0026#39;: exp-22, \u0026#39;Gender\u0026#39;: 1, \u0026#39;FOS\u0026#39;: fos})[0]) else: main_male[exp - 18] = np.exp(main_model.predict({\u0026#39;edu_years\u0026#39;: main_edu, \u0026#39;exp_years\u0026#39;: exp-22, \u0026#39;Gender\u0026#39;: 2})[0]) main_female[exp - 18] = np.exp(main_model.predict({\u0026#39;edu_years\u0026#39;: main_edu, \u0026#39;exp_years\u0026#39;: exp-22, \u0026#39;Gender\u0026#39;: 1})[0]) for exp in range(18 + sec_range,66): if bFos: sec_male[exp - 18] = np.exp(sec_model.predict({\u0026#39;edu_years\u0026#39;: sec_edu, \u0026#39;exp_years\u0026#39;: exp-(18 + sec_range), \u0026#39;Gender\u0026#39;: 2, \u0026#39;FOS\u0026#39;: fos})[0]) sec_female[exp - 18] = np.exp(sec_model.predict({\u0026#39;edu_years\u0026#39;: sec_edu, \u0026#39;exp_years\u0026#39;: exp-(18 + sec_range), \u0026#39;Gender\u0026#39;: 1, \u0026#39;FOS\u0026#39;: fos})[0]) else: sec_male[exp - 18] = np.exp(sec_model.predict({\u0026#39;edu_years\u0026#39;: sec_edu, \u0026#39;exp_years\u0026#39;: exp-(18 + sec_range), \u0026#39;Gender\u0026#39;: 2})[0]) sec_female[exp - 18] = np.exp(sec_model.predict({\u0026#39;edu_years\u0026#39;: sec_edu, \u0026#39;exp_years\u0026#39;: exp-(18 + sec_range), \u0026#39;Gender\u0026#39;: 1})[0]) annual_payment = osap_loan(grant_ratio=ratio/100) main_male[4:14] = [x - annual_payment for x in main_male[4:14]] main_female[4:14] = [x - annual_payment for x in main_female[4:14]] data = [main_male, sec_male, main_female, sec_female] cash_flow_osap = pd.DataFrame(data=list(zip(*data)), columns=[\u0026#39;main_male\u0026#39;, \u0026#39;sec_male\u0026#39;, \u0026#39;main_female\u0026#39;, \u0026#39;sec_female\u0026#39;], index=range(0, 48)) cash_flow_osap[\u0026#39;net_main_male\u0026#39;] = cash_flow_osap[\u0026#39;main_male\u0026#39;] / (1 + 0.03) ** cash_flow_osap.index cash_flow_osap[\u0026#39;net_sec_male\u0026#39;]= cash_flow_osap[\u0026#39;sec_male\u0026#39;] / (1 + 0.03) ** cash_flow_osap.index cash_flow_osap[\u0026#39;net_main_female\u0026#39;] = cash_flow_osap[\u0026#39;main_female\u0026#39;] / (1 + 0.03) ** cash_flow_osap.index cash_flow_osap[\u0026#39;net_sec_female\u0026#39;]= cash_flow_osap[\u0026#39;sec_female\u0026#39;] / (1 + 0.03) ** cash_flow_osap.index return[cash_flow_osap[\u0026#39;net_main_male\u0026#39;].sum() - cash_flow_osap[\u0026#39;net_sec_male\u0026#39;].sum(), cash_flow_osap[\u0026#39;net_main_female\u0026#39;].sum() - cash_flow_osap[\u0026#39;net_sec_female\u0026#39;].sum() ] ratios = range(0,26) # from 0% grants to 25% grants, the maximum osap_npvs = [] for ratio in ratios: osap_npvs.append(osap_npv(ratio=ratio)) pd.DataFrame(osap_npvs, columns=[\u0026#39;Male\u0026#39;, \u0026#39;Female\u0026#39;]) Male Female 0 359715.410269 234969.887073 1 359994.418157 235248.894961 2 360273.426044 235527.902848 3 360552.433931 235806.910735 4 360831.441819 236085.918623 5 361110.449706 236364.926510 6 361389.457593 236643.934397 7 361668.465480 236922.942285 8 361947.473368 237201.950172 9 362226.481255 237480.958059 10 362505.489142 237759.965947 11 362784.497030 238038.973834 12 363063.504917 238317.981721 13 363342.512804 238596.989609 14 363621.520692 238875.997496 15 363900.528579 239155.005383 16 364179.536466 239434.013271 17 364458.544354 239713.021158 18 364737.552241 239992.029045 19 365016.560128 240271.036932 20 365295.568016 240550.044820 21 365574.575903 240829.052707 22 365853.583790 241108.060594 23 366132.591677 241387.068482 24 366411.599565 241666.076369 25 366690.607452 241945.084256 The results are about as clear as they get - post-secondary education, no matter the grant ratio, is a good investment relative to going through life with a high school diploma. The main reasons for the \u0026rsquo;no matter\u0026rsquo; part are simple: OSAP loans are 70% federal, which is zero-interest. Zero-interest loans are no different from the one-time expenses we considered in the naive model. Of course, since up to 25% of your allowance is grants - free money - that helps out too.\nOf course, most critics of college do not recommend just a high school diploma. They often recommend the skilled trades.\nratios = range(0,26) # from 0% grants to 25% grants, the maximum osap_npvs = [] for ratio in ratios: osap_npvs.append(osap_npv(sec_edu=13, ratio=ratio)) pd.DataFrame(osap_npvs, columns=[\u0026#39;Male\u0026#39;, \u0026#39;Female\u0026#39;]) Male Female 0 277557.536966 179252.670575 1 277836.544853 179531.678463 2 278115.552741 179810.686350 3 278394.560628 180089.694237 4 278673.568515 180368.702125 5 278952.576403 180647.710012 6 279231.584290 180926.717899 7 279510.592177 181205.725787 8 279789.600065 181484.733674 9 280068.607952 181763.741561 10 280347.615839 182042.749449 11 280626.623727 182321.757336 12 280905.631614 182600.765223 13 281184.639501 182879.773110 14 281463.647389 183158.780998 15 281742.655276 183437.788885 16 282021.663163 183716.796772 17 282300.671050 183995.804660 18 282579.678938 184274.812547 19 282858.686825 184553.820434 20 283137.694712 184832.828322 21 283416.702600 185111.836209 22 283695.710487 185390.844096 23 283974.718374 185669.851984 24 284253.726262 185948.859871 25 284532.734149 186227.867758 This dents our numbers by ~$55k for women at the pessimistic end, and ~$80k for men, but we\u0026rsquo;re still solidly in the green.\nGeneral Statistics about Education # A sort of addendum. Here I visualise some interesting data before we go into the final section.\nmapped_df = mincer_df.copy() mapped_df[\u0026#39;FOS\u0026#39;] = mapped_df[\u0026#39;CIP2021\u0026#39;].map(CIP2021) mapped_df[\u0026#39;Degree\u0026#39;] = mapped_df[\u0026#39;SSGRAD\u0026#39;].map(SSGRAD) mapped_df[\u0026#39;gen\u0026#39;] = mapped_df[\u0026#39;Gender\u0026#39;].map(GENDER) mapped_df[\u0026#39;gen\u0026#39;] = mapped_df[\u0026#39;Gender\u0026#39;].map(GENDER) mapped_df PPSORT AGEGRP CIP2021 EmpIn Gender LFACT PR SSGRAD log_wage age_mid edu_years exp_years FOS Degree gen 0 2 11 8 12000 1 3 35 6 9.392662 37 14 17 Architecture/engineering/trades College/CEGEP Woman 1 8 16 4 61000 1 13 35 11 11.018629 62 18 38 Social sciences \u0026amp; law Masters Woman 3 10 12 13 25000 2 1 35 4 10.126631 42 12 24 No postsecondary degree HS Man 4 12 13 5 130000 1 1 35 8 11.775290 47 16 25 Business/management/public admin Bachelor Woman 8 21 10 5 63000 1 1 35 11 11.050890 32 18 8 Business/management/public admin Masters Woman ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 378842 980851 11 8 180000 1 1 35 12 12.100712 37 21 10 Architecture/engineering/trades Doctorate Woman 378843 980852 12 13 49000 2 1 35 1 10.799576 42 10 26 No postsecondary degree No certificate Man 378844 980856 12 10 110000 1 1 35 6 11.608236 42 14 22 Health College/CEGEP Woman 378846 980862 16 13 33000 1 1 35 4 10.404263 62 12 44 No postsecondary degree HS Woman 378848 980866 10 5 130000 2 1 35 6 11.775290 32 14 12 Business/management/public admin College/CEGEP Man 182864 rows × 15 columns\nsns.set_theme(style=\u0026#34;whitegrid\u0026#34;, font=\u0026#34;Monospace\u0026#34;, font_scale=1, rc={\u0026#39;figure.figsize\u0026#39;: (14, 8)}) median_deg = mapped_df.groupby(\u0026#39;Degree\u0026#39;)[\u0026#39;EmpIn\u0026#39;].median().sort_values() g = sns.barplot(median_deg, palette=\u0026#34;Blues_d\u0026#34;, legend=False) g.set_xticklabels(g.get_xticklabels(), rotation=45, horizontalalignment=\u0026#39;right\u0026#39;) g.set_xlabel(\u0026#39;Degree\u0026#39;) g.set_ylabel(\u0026#39;Median Employment Income (CAD$)\u0026#39;) g.set_title(\u0026#39;Median Income by Degree\u0026#39;, y=1.02) plt.show() We see a pretty linear increase as levels of education increase, but not all degree jumps are equal. Getting a bachelor (vs HS) increases your earnings by roughly 80%, where a masters merely gives you ~$20k on top of that, but for a 2 year investment. It may be justified if you\u0026rsquo;re getting a doctorate, at the cost of an additional 3 years of study but with nearly double the median earnings of a bachelor - it will not take long to earn the money back. The difference in trades and a bachelor expressed like this is minor, worse when you consider the lower time expenditure and debt burden; however, we\u0026rsquo;ve already calculated the NPV to find that a bachelor still makes sense.\nmedian_fos = mapped_df.groupby(\u0026#39;FOS\u0026#39;)[\u0026#39;EmpIn\u0026#39;].median().sort_values() g = sns.barplot(median_fos, palette=\u0026#39;Blues_d\u0026#39;) g.set_xticklabels(g.get_xticklabels(), rotation=45, horizontalalignment=\u0026#39;right\u0026#39;) g.set_xlabel(\u0026#39;Field of Study\u0026#39;) g.set_ylabel(\u0026#39;Median Employment Income (CAD$)\u0026#39;) g.set_title(\u0026#39;Median Income by Field of Study\u0026#39;, y=1.02) plt.show() There is a wide variation in median earnings by field of study as well - STEM (and, curiously, \u0026rsquo;education\u0026rsquo;) pays the most and the arts and humanities pay the least. Education can be explained by the fact that teachers are paid handsomely in Ontario \u0026lsquo;with salaries ranging from $65,000 to $110,000 per year\u0026rsquo;. As a category, social sciences \u0026amp; law may make sense academically but does complicate our analysis: law degrees and sociology majors will likely get paid quite differently.\ngen_median_deg = mapped_df.groupby([\u0026#39;gen\u0026#39;, \u0026#39;Degree\u0026#39;])[\u0026#39;EmpIn\u0026#39;].median().sort_values() fig, axes = plt.subplots(1, 2, figsize=(14, 8), sharey=False) genders = [\u0026#39;Woman\u0026#39;, \u0026#39;Man\u0026#39;] for ax, gender in zip(axes, genders): data = gen_median_deg[gender] sns.barplot(x=data.index, y=data.values, ax=ax, palette=\u0026#39;Blues_d\u0026#39;, hue=data.index) ax.set_title(gender) ax.set_xlabel(\u0026#39;\u0026#39;) ax.set_ylabel(\u0026#39;Median Employment Income (CAD$)\u0026#39; if gender == \u0026#39;Woman\u0026#39; else \u0026#39;\u0026#39;) ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment=\u0026#39;right\u0026#39;) fig.suptitle(\u0026#39;Median Income by Degree and Gender\u0026#39;, y=1.02) plt.tight_layout() plt.show() gap_deg = gen_median_deg.unstack(level=0) gap_deg[\u0026#39;Gap\u0026#39;] = ((gap_deg[\u0026#39;Woman\u0026#39;] / gap_deg[\u0026#39;Man\u0026#39;]) * 100) g = sns.barplot(gap_deg[\u0026#39;Gap\u0026#39;].sort_values(ascending=False), palette=\u0026#39;Blues_d\u0026#39;) g.set_xticklabels(g.get_xticklabels(), rotation=45, horizontalalignment=\u0026#39;right\u0026#39;) g.set_xlabel(\u0026#39;Field of Study\u0026#39;) g.set_ylabel(\u0026#39;Gender Gap (Female Earnings as % of Men)\u0026#39;) g.set_title(\u0026#39;Gender Gap by Degree\u0026#39;, y=1.02) plt.show() This data is unsurprising. The more educated women are, the more they surmount the gender pay gap - the situation, as common-sense would tell you, is worst in the trades. What\u0026rsquo;s surprising is that women without any certification (in terms of gender gap relative to male counterparts, not absolute income) do better in this regard. Of course, \u0026lsquo;No certificate\u0026rsquo; is an odd category, likely a small number of immigrants or refugees who don\u0026rsquo;t have any Canadian certifications, and you shouldn\u0026rsquo;t read too much into it. The reasons for the pay gap declining by degree are manifold, and beyond the scope of this post.\ngen_median_fos = mapped_df.groupby([\u0026#39;gen\u0026#39;, \u0026#39;FOS\u0026#39;])[\u0026#39;EmpIn\u0026#39;].median() fig, axes = plt.subplots(1, 2, figsize=(20, 8), sharey=False) genders = [\u0026#39;Woman\u0026#39;, \u0026#39;Man\u0026#39;] for ax, gender in zip(axes, genders): data = gen_median_fos[gender].sort_values() sns.barplot(x=data.index, y=data.values, ax=ax, palette=\u0026#39;Blues_d\u0026#39;, hue=data.index) ax.set_title(gender) ax.set_xlabel(\u0026#39;\u0026#39;) ax.set_ylabel(\u0026#39;Median Employment Income (CAD$)\u0026#39; if gender == \u0026#39;Woman\u0026#39; else \u0026#39;\u0026#39;) ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment=\u0026#39;right\u0026#39;) fig.suptitle(\u0026#39;Median Income by Field of Study and Gender\u0026#39;, y=1.02) plt.tight_layout() plt.show() gap = gen_median_fos.unstack(level=0) gap[\u0026#39;Gap\u0026#39;] = ((gap[\u0026#39;Woman\u0026#39;] / gap[\u0026#39;Man\u0026#39;]) * 100) g = sns.barplot(gap[\u0026#39;Gap\u0026#39;].sort_values(ascending=False), palette=\u0026#39;Blues_d\u0026#39;) g.set_xticklabels(g.get_xticklabels(), rotation=45, horizontalalignment=\u0026#39;right\u0026#39;) g.set_xlabel(\u0026#39;Field of Study\u0026#39;) g.set_ylabel(\u0026#39;Gender Pay Gap (Female Earnings as % of Men)\u0026#39;) g.set_title(\u0026#39;Gender Gap by Field of Study\u0026#39;, y=1.02) plt.show() Women in the performing arts and \u0026lsquo;comm\u0026rsquo;, and STEM suffer the lowest gender pay gaps. The worst pay gap is in \u0026lsquo;personal/protective/transport services\u0026rsquo;, the non-degree category, and social sciences \u0026amp; law. The latter may be explained primarily through the law subcategory. Lawyer jobs are extremely demanding (\u0026lsquo;greedy\u0026rsquo;) and women tend to disprefer such jobs; female representation in that category, therefore, will likely be in the lower-paying degrees in the social sciences, therefore skewing the category\u0026rsquo;s gap.\nField of Study Comparisons # Back to the plot. Rationally speaking, there\u0026rsquo;s really no argument against even maximal changes to the loan and grant ratio for OSAP. Of course, (future) students are relatively worse off and that is a negative, regardless of whether you think it\u0026rsquo;s a sufficient criticism (for that, you\u0026rsquo;d need to analyse a whole lot more than just ROI, including Ontario\u0026rsquo;s fiscal space). But is that really all there is to it? Let\u0026rsquo;s break down the analysis into more granular terms to get a clearer picture before our final conclusion. We will calculate NPV by the following categories:\nField of Study Gender We could also do so by degree, but degree finances get more and more complicated as you go up the chain - doctorates being worst in this regard. Such an analysis would require a separate notebook.\nI will do two separate analyses: one will compare a bachelors to a hs diploma, the other will compare a bachelors to a trades degree, by field of study.\nBoth analyses will use a pessimistic scenario for osap ratios: 100% loans.\n# Filters for post-secondary degrees ps_df = mapped_df.copy() # This model doesn\u0026#39;t include edu_years to prevent multicollinearity model_fos = sm.OLS.from_formula(\u0026#39;log_wage ~ edu_years + C(FOS)*exp_years + C(FOS)*I(exp_years**2) + C(Gender)\u0026#39;, data=ps_df) results_fos = model_fos.fit() results_fos.summary() OLS Regression Results Dep. Variable: log_wage R-squared: 0.113 Model: OLS Adj. R-squared: 0.112 Method: Least Squares F-statistic: 626.4 Date: Mon, 13 Apr 2026 Prob (F-statistic): 0.00 Time: 17:07:43 Log-Likelihood: -3.3861e+05 No. Observations: 182864 AIC: 6.773e+05 Df Residuals: 182826 BIC: 6.777e+05 Df Model: 37 Covariance Type: nonrobust coef std err t P\u003e|t| [0.025 0.975] Intercept 7.6506 0.105 72.889 0.000 7.445 7.856 C(FOS)[T.Architecture/engineering/trades] 0.0265 0.101 0.261 0.794 -0.172 0.225 C(FOS)[T.Business/management/public admin] 0.0067 0.100 0.067 0.947 -0.190 0.203 C(FOS)[T.Education] -0.2288 0.122 -1.880 0.060 -0.467 0.010 C(FOS)[T.Health] 0.0384 0.102 0.376 0.707 -0.162 0.239 C(FOS)[T.Humanities] -0.3519 0.112 -3.130 0.002 -0.572 -0.132 C(FOS)[T.Math/CS/info] -0.0971 0.110 -0.885 0.376 -0.312 0.118 C(FOS)[T.No postsecondary degree] -0.6518 0.099 -6.571 0.000 -0.846 -0.457 C(FOS)[T.Personal/protective/transport services] -0.1255 0.112 -1.118 0.263 -0.345 0.094 C(FOS)[T.Physical/life sciences \u0026 tech] -0.5344 0.107 -4.984 0.000 -0.745 -0.324 C(FOS)[T.Social sciences \u0026 law] -0.1646 0.101 -1.623 0.105 -0.363 0.034 C(FOS)[T.Visual/performing arts \u0026 comm] -0.5159 0.114 -4.528 0.000 -0.739 -0.293 C(Gender)[T.2] 0.3653 0.008 46.602 0.000 0.350 0.381 edu_years 0.1216 0.003 45.412 0.000 0.116 0.127 exp_years 0.1069 0.011 10.079 0.000 0.086 0.128 C(FOS)[T.Architecture/engineering/trades]:exp_years 0.0014 0.011 0.126 0.900 -0.020 0.023 C(FOS)[T.Business/management/public admin]:exp_years -0.0074 0.011 -0.672 0.501 -0.029 0.014 C(FOS)[T.Education]:exp_years 0.0234 0.013 1.795 0.073 -0.002 0.049 C(FOS)[T.Health]:exp_years -0.0220 0.011 -1.955 0.051 -0.044 5.06e-05 C(FOS)[T.Humanities]:exp_years -0.0027 0.012 -0.216 0.829 -0.027 0.022 C(FOS)[T.Math/CS/info]:exp_years 0.0123 0.012 1.023 0.306 -0.011 0.036 C(FOS)[T.No postsecondary degree]:exp_years 0.0130 0.011 1.200 0.230 -0.008 0.034 C(FOS)[T.Personal/protective/transport services]:exp_years -0.0107 0.012 -0.880 0.379 -0.035 0.013 C(FOS)[T.Physical/life sciences \u0026 tech]:exp_years 0.0429 0.012 3.526 0.000 0.019 0.067 C(FOS)[T.Social sciences \u0026 law]:exp_years 0.0043 0.011 0.380 0.704 -0.018 0.026 C(FOS)[T.Visual/performing arts \u0026 comm]:exp_years 0.0036 0.013 0.282 0.778 -0.021 0.028 I(exp_years ** 2) -0.0024 0.000 -10.098 0.000 -0.003 -0.002 C(FOS)[T.Architecture/engineering/trades]:I(exp_years ** 2) 0.0002 0.000 0.685 0.494 -0.000 0.001 C(FOS)[T.Business/management/public admin]:I(exp_years ** 2) 0.0004 0.000 1.651 0.099 -7.67e-05 0.001 C(FOS)[T.Education]:I(exp_years ** 2) -0.0004 0.000 -1.408 0.159 -0.001 0.000 C(FOS)[T.Health]:I(exp_years ** 2) 0.0008 0.000 3.033 0.002 0.000 0.001 C(FOS)[T.Humanities]:I(exp_years ** 2) 0.0003 0.000 1.020 0.308 -0.000 0.001 C(FOS)[T.Math/CS/info]:I(exp_years ** 2) -9.931e-05 0.000 -0.361 0.718 -0.001 0.000 C(FOS)[T.No postsecondary degree]:I(exp_years ** 2) 0.0004 0.000 1.465 0.143 -0.000 0.001 C(FOS)[T.Personal/protective/transport services]:I(exp_years ** 2) 0.0004 0.000 1.330 0.183 -0.000 0.001 C(FOS)[T.Physical/life sciences \u0026 tech]:I(exp_years ** 2) -0.0008 0.000 -2.706 0.007 -0.001 -0.000 C(FOS)[T.Social sciences \u0026 law]:I(exp_years ** 2) 0.0001 0.000 0.498 0.619 -0.000 0.001 C(FOS)[T.Visual/performing arts \u0026 comm]:I(exp_years ** 2) 0.0002 0.000 0.625 0.532 -0.000 0.001 Omnibus: 142563.407 Durbin-Watson: 1.997 Prob(Omnibus): 0.000 Jarque-Bera (JB): 3512451.082 Skew: -3.625 Prob(JB): 0.00 Kurtosis: 23.210 Cond. No. 9.34e+04 Notes:[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.[2] The condition number is large, 9.34e+04. This might indicate that there arestrong multicollinearity or other numerical problems. hs_npvs = {} for fos in ps_df[\u0026#39;FOS\u0026#39;].dropna().unique(): if fos == \u0026#39;No postsecondary degree\u0026#39;: continue npvs = osap_npv(ratio=0, main_model=results_fos, sec_model=results, bFos=True, fos=fos) hs_npvs[fos] = npvs hs_npvs_df = pd.DataFrame.from_dict(hs_npvs, orient=\u0026#39;index\u0026#39;, columns=[\u0026#39;Male\u0026#39;, \u0026#39;Female\u0026#39;]) hs_npvs_df Male Female Architecture/engineering/trades 511710.016950 357405.052259 Social sciences \u0026amp; law 333757.556643 233903.671114 Business/management/public admin 427342.998907 298853.216373 Math/CS/info 458971.503149 320803.820208 Health 349597.080201 244896.511745 Education 378718.398483 265107.095080 Humanities 109472.619446 78246.932983 Personal/protective/transport services 203982.747130 143838.222258 Physical/life sciences \u0026amp; tech 278627.998659 195643.022505 Visual/performing arts \u0026amp; comm 35355.603031 26808.734951 Agriculture/natural resources/conservation 358928.782511 251372.837624 fig, axes = plt.subplots(1, 2, figsize=(20, 8), sharey=False) genders = [\u0026#39;Male\u0026#39;, \u0026#39;Female\u0026#39;] for ax, gender in zip(axes, genders): data = hs_npvs_df[gender].sort_values() sns.barplot(x=data.index, y=data.values, ax=ax, palette=\u0026#39;Blues_d\u0026#39;, hue=data.index) ax.set_title(gender) ax.set_xlabel(\u0026#39;\u0026#39;) ax.set_ylabel(\u0026#39;Net Present Value (CAD$)\u0026#39; if gender == \u0026#39;Male\u0026#39; else \u0026#39;\u0026#39;) ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment=\u0026#39;right\u0026#39;) fig.suptitle(\u0026#39;Net Present Value of Bachelors by Field of Study and Gender\u0026#39;, y=1.02) plt.tight_layout() plt.show() We see an extremely vide variation in NPV as well. The counterfactual here is a high school degree. Overall, all fields of study are still a net positive, but the arts are close to having a negative ROI.\ntrade_npvs = {} for fos in ps_df[\u0026#39;FOS\u0026#39;].dropna().unique(): if fos == \u0026#39;No postsecondary degree\u0026#39;: continue npvs = osap_npv(ratio=0, main_model=results_fos, sec_model=results, bFos=True, fos=fos, sec_edu=13) trade_npvs[fos] = npvs trade_npvs_df = pd.DataFrame.from_dict(trade_npvs, orient=\u0026#39;index\u0026#39;, columns=[\u0026#39;Male\u0026#39;, \u0026#39;Female\u0026#39;]) trade_npvs_df Male Female Architecture/engineering/trades 429552.143647 301687.835761 Social sciences \u0026amp; law 251599.683340 178186.454616 Business/management/public admin 345185.125603 243135.999875 Math/CS/info 376813.629846 265086.603709 Health 267439.206897 189179.295247 Education 296560.525180 209389.878582 Humanities 27314.746143 22529.716485 Personal/protective/transport services 121824.873827 88121.005760 Physical/life sciences \u0026amp; tech 196470.125356 139925.806007 Visual/performing arts \u0026amp; comm -46802.270272 -28908.481547 Agriculture/natural resources/conservation 276770.909208 195655.621126 fig, axes = plt.subplots(1, 2, figsize=(20, 8), sharey=False) genders = [\u0026#39;Male\u0026#39;, \u0026#39;Female\u0026#39;] for ax, gender in zip(axes, genders): data = trade_npvs_df[gender].sort_values() sns.barplot(x=data.index, y=data.values, ax=ax, palette=\u0026#39;Blues_d\u0026#39;, hue=data.index) ax.set_title(gender) ax.set_xlabel(\u0026#39;\u0026#39;) ax.set_ylabel(\u0026#39;Net Present Value (CAD$)\u0026#39; if gender == \u0026#39;Male\u0026#39; else \u0026#39;\u0026#39;) ax.set_xticklabels(ax.get_xticklabels(), rotation=45, horizontalalignment=\u0026#39;right\u0026#39;) fig.suptitle(\u0026#39;Net Present Value of Bachelors by Field of Study and Gender\u0026#39;, y=1.02) plt.tight_layout() plt.show() We finally get our first instance of education with a negative ROI - compared to becoming a skilled tradesman, getting an arts degree is a bad investment. Humanities really toes it close to zero, but is positive with a NPV of $~20k for both genders. Beyond that however, most fields still offer a substantial (\u0026gt;$100k) positive ROI for both genders.\nConclusion # In sum, education offers a positive ROI regardless of gender, grant/loan ratios, and field of study (with the sole exception of visual/performing arts vs trades) relative to a high school diploma and trades. There is wide variance between fields of study and degree type, and a substantial gender gap that also varies by FoS/Degree.\nOn average, men earn ~47% more than women controlling for education and experience. This is a \u0026rsquo;naive\u0026rsquo; gap - the entire difference is not made up solely of discrimination, since our model does not account for many variables. Generally, the gender gap decreases as women climb the education ladder and it is lowest in the Arts and STEM fields (this is just the gap, and does not account for absolute income). There is a substantial pay gap in the trades, as well. Put together, this tells us that education is likely a more important investment for women relative to men.\nOSAP\u0026rsquo;s recent changes make no difference to the NPV of education. Through a sensitivity analysis comparing grant/loan ratios, I found that even in the worst case scenario (100% loans), the NPV of education (compared to a counterfactual trades degree) is $270k for men and $180k for women. The OSAP ratio change actually makes very little difference at all - this is largely because 70% of federal loans are interest-free. The number does vary by field of study - visual/performing arts degrees are negative (\u0026lt;$10k), and humanities degrees offer an insubstantial ROI (~$20k for both genders), but most other fields offer \u0026gt;$100k returns. This tells us that recent or anticipated OSAP changes should not affect your college plans, except where your liquidity is questionable (i.e. OSAP does not cover 100% of college costs, and you can not make up the difference).\nLimitations # Many of these limitations are for simplicity\u0026rsquo;s sake - to prevent this notebook from becoming too large. It bears listing them, however. There are also specific limitations listed in their relevant sections.\nMy models factor unemployment in through its effect (drag) on income, not as a separate variable. A more complex model would handle this factor separately and properly. This could be done by modelling employment probability separately and combining that with income predictions. Experience calculation is also affected similarly - few people would have 45 years of experience at the end of their working lifetime, because that assumes continuous employment (i.e. no unemployment). My models do not control for that many variables - hence the low R^2 of ~0.1. These include specific factors relevant to income, like part-time/full-time work or hours worked per week, and general factors like race or immigration status. My models do not account for the weights given by PUMF. That means my model is not ideal for population-level results - this is fine for measuring relationships, but knocks my precision. Immigrants might deserve a separate notebook, esp. for those with foreign education. I use an interest rate of 6% throughout my calculations. An extension of this notebook would be to perform a sensitivity analysis. There are some simplifications when calculating OSAP loan annuities. There is a 6-month grace period, and the actual time horizon is 9.5 years (where I use 10). Since my cash flow models annual income, it is difficult to accomodate this - hence the simplification. There are also tax considerations vis-a-vis both income and student loan tax credits that I have ignored. ","date":"13 April 2026","externalUrl":null,"permalink":"/posts/ontario-education--osap/","section":"Posts","summary":"","title":"Ontario Education \u0026 OSAP","type":"posts"},{"content":"","date":"13 April 2026","externalUrl":null,"permalink":"/posts/","section":"Posts","summary":"","title":"Posts","type":"posts"},{"content":"","date":"13 April 2026","externalUrl":null,"permalink":"/series/","section":"Series","summary":"","title":"Series","type":"series"},{"content":"import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import statsmodels.api as sm sns.set_theme() Testing the Phillips Curve # The Phillips Curve argues that there is a negative correlation between inflation and unemployment - when inflation increases, unemployment decreases, and vice versa. The mechanism this occurs by is hypothesised as such:\nSuppose that unemployment has decreased Employees have more bargaining power, because of the decrease in supply of labor, and can demand higher wages Higher wages lead to higher costs of production These costs are passed onto consumers, leading to an increase in the general price level (inflation) This traditional formulation was empirically disproven by the oil shock after 1973. This brought on a decade of stagflation, where increases in inflation did not correlate with decreases in unemployment. It was replaced with the Expectations Augmented Phillips Curve, which provides a fuller picture, taking the long-run and worker expectations into account:\nIn the long run, employees will see their real wages decline This will incite them to demand greater wages This will force firms to lay off workers Employment will return to its natural rate, but inflation will still be high The Traditional Phillips Curve # By plotting unemployment and inflation data, we can get a general idea of their correlation. I will use FRED\u0026rsquo;s annual inflation data for the US, not seasonally adjusted, and its monthly unemployment data, seasonally adjusted.\ninf = pd.read_csv(\u0026#39;inf.csv\u0026#39;, index_col=\u0026#39;observation_date\u0026#39;, parse_dates=True).rename(columns={\u0026#39;FPCPITOTLZGUSA\u0026#39;: \u0026#39;inflation_rate\u0026#39;}) inf inflation_rate observation_date 1960-01-01 1.457976 1961-01-01 1.070724 1962-01-01 1.198773 1963-01-01 1.239669 1964-01-01 1.278912 ... ... 2020-01-01 1.233584 2021-01-01 4.697859 2022-01-01 8.002800 2023-01-01 4.116338 2024-01-01 2.949525 65 rows × 1 columns\nunemp = pd.read_csv(\u0026#39;unemp.csv\u0026#39;, index_col=\u0026#39;observation_date\u0026#39;, parse_dates=True).rename(columns={\u0026#39;UNRATE\u0026#39;: \u0026#39;unemployment_rate\u0026#39;}) unemp unemployment_rate observation_date 1948-01-01 3.4 1948-02-01 3.8 1948-03-01 4.0 1948-04-01 3.9 1948-05-01 3.5 ... ... 2025-10-01 NaN 2025-11-01 4.5 2025-12-01 4.4 2026-01-01 4.3 2026-02-01 4.4 938 rows × 1 columns\nNow we have to merge the dataframes. Our unemployment data is monthly, so we have to resample it to an annual basis. Annual unemployment is calculated as the average of the 12 monthly figures.\nunemp_inf = pd.merge(unemp.resample(\u0026#39;YS\u0026#39;).mean(), inf, how=\u0026#39;left\u0026#39;, on=\u0026#39;observation_date\u0026#39;) unemp_inf.tail(20) unemployment_rate inflation_rate observation_date 2007-01-01 4.616667 2.852672 2008-01-01 5.800000 3.839100 2009-01-01 9.283333 -0.355546 2010-01-01 9.608333 1.640043 2011-01-01 8.933333 3.156842 2012-01-01 8.075000 2.069337 2013-01-01 7.358333 1.464833 2014-01-01 6.158333 1.622223 2015-01-01 5.275000 0.118627 2016-01-01 4.875000 1.261583 2017-01-01 4.358333 2.130110 2018-01-01 3.891667 2.442583 2019-01-01 3.675000 1.812210 2020-01-01 8.100000 1.233584 2021-01-01 5.350000 4.697859 2022-01-01 3.650000 8.002800 2023-01-01 3.625000 4.116338 2024-01-01 4.025000 2.949525 2025-01-01 4.263636 NaN 2026-01-01 4.350000 NaN Our inflation data is missing for many dates. We must drop the rows where we don\u0026rsquo;t have matching unemployment and inflation data.\nunemp_inf = unemp_inf.dropna() unemp_inf unemployment_rate inflation_rate observation_date 1960-01-01 5.541667 1.457976 1961-01-01 6.691667 1.070724 1962-01-01 5.566667 1.198773 1963-01-01 5.641667 1.239669 1964-01-01 5.158333 1.278912 ... ... ... 2020-01-01 8.100000 1.233584 2021-01-01 5.350000 4.697859 2022-01-01 3.650000 8.002800 2023-01-01 3.625000 4.116338 2024-01-01 4.025000 2.949525 65 rows × 2 columns\nsns.lineplot(unemp_inf) \u0026lt;Axes: xlabel='observation_date'\u0026gt; Fig 1: Unemployment and Inflation What\u0026rsquo;s cool about this specific data is that we can trace the history of economic thought, specifically the Phillips\u0026rsquo; Curve, through it. If we look at the 1960-70 period, we see the negative correlation the traditional model argues for. 1970 onwards, we see that correlation completely collapse.\nsixties = unemp_inf[\u0026#39;1960-01-01\u0026#39;:\u0026#39;1967-01-01\u0026#39;] sns.lineplot(sixties) \u0026lt;Axes: xlabel='observation_date'\u0026gt; Fig 2: Unemployment and Inflation in the 60s sixties[\u0026#39;unemployment_rate\u0026#39;].corr(sixties[\u0026#39;inflation_rate\u0026#39;]) np.float64(-0.8812049507014807) We see a reasonably strong negative correlation, as Bill Phillips demonstrated.\nsns.regplot(x=sixties[\u0026#39;inflation_rate\u0026#39;], y=sixties[\u0026#39;unemployment_rate\u0026#39;]) \u0026lt;Axes: xlabel='inflation_rate', ylabel='unemployment_rate'\u0026gt; Fig 3: A Regression Plot of the Same Stagflation # seventies = unemp_inf[\u0026#39;1970-01-01\u0026#39;:\u0026#39;1980-01-01\u0026#39;] sns.lineplot(seventies) \u0026lt;Axes: xlabel='observation_date'\u0026gt; Fig 4: Unemployment and Inflation in the 70s seventies[\u0026#39;unemployment_rate\u0026#39;].corr(seventies[\u0026#39;inflation_rate\u0026#39;]) np.float64(0.2663010944577987) We see a weak positive correlation. Simultaneous highs of inflation and unemployment seemed to disprove the Traditional Phillips Curve.\nsns.regplot(x=seventies[\u0026#39;inflation_rate\u0026#39;], y=seventies[\u0026#39;unemployment_rate\u0026#39;]) \u0026lt;Axes: xlabel='inflation_rate', ylabel='unemployment_rate'\u0026gt; Fig 5: A Regression Plot of the Same The Short-run # Economists do not hold the traditional Phillips Curve to be false, but merely a short-run theory. Now I\u0026rsquo;m going to test if this is true: where short-run is equal to one year, does a strong negative correlation between unemployment and inflation hold?\nTo do this, we need to get monthly YoY inflation data. FRED gives us \u0026lsquo;Consumer Price Index for All Urban Consumers: All Items in U.S. City Average\u0026rsquo;. We will need to convert this to percentage changes, but it should work.\ncpi_monthly = pd.read_csv(\u0026#39;inf_monthly.csv\u0026#39;, index_col=\u0026#39;observation_date\u0026#39;, parse_dates=True).rename(columns={\u0026#39;CPIAUCSL\u0026#39;: \u0026#39;CPI\u0026#39;}) inf_monthly = (cpi_monthly.pct_change(12) * 100).dropna().rename(columns={\u0026#39;CPI\u0026#39;: \u0026#39;inflation_rate\u0026#39;}) inf_monthly inflation_rate observation_date 1948-01-01 10.242086 1948-02-01 9.481961 1948-03-01 6.818182 1948-04-01 8.272727 1948-05-01 9.384966 ... ... 2025-09-01 3.022572 2025-11-01 2.696444 2025-12-01 2.653304 2026-01-01 2.391201 2026-02-01 2.434004 937 rows × 1 columns\nNow that we have our data, we can get the average negative correlation between inflation and unemployment over twelve-month periods.\nunemp_inf_monthly = pd.merge(unemp, inf_monthly, \u0026#39;right\u0026#39;, on=\u0026#39;observation_date\u0026#39;) unemp_inf_monthly.dropna() unemployment_rate inflation_rate observation_date 1948-01-01 3.4 10.242086 1948-02-01 3.8 9.481961 1948-03-01 4.0 6.818182 1948-04-01 3.9 8.272727 1948-05-01 3.5 9.384966 ... ... ... 2025-09-01 4.4 3.022572 2025-11-01 4.5 2.696444 2025-12-01 4.4 2.653304 2026-01-01 4.3 2.391201 2026-02-01 4.4 2.434004 937 rows × 2 columns\nuninf_corr = unemp_inf_monthly.groupby(unemp_inf_monthly.index.year).corr().iloc[0::2,-1].unstack() sns.lineplot(uninf_corr) \u0026lt;Axes: xlabel='observation_date'\u0026gt; Fig 6: Correlation b/w Unemployment and Inflation Within Each Year uninf_corr.mean() unemployment_rate -0.139759 dtype: float64 This shows us that, within 12-month periods, there is no strong correlation between unemployment and inflation. If unemployment rises in a month, inflation does not rise simultaneously.\nThe biggest flaw with this analysis is, of course, that it fails to account for a time gap. Moreover, within 12-month periods there are only twelve total samples; averaging these does not make the reults especially meaningful.\nInstead of intra-short-run periods, let\u0026rsquo;s try a 3-month gap/lag between unemployment and inflation - how does unemployment rising in one month correlate with inflation three months on?\nAdding a Lag # lagged_unempinf_monthly = unemp_inf_monthly.copy() lagged_unempinf_monthly[\u0026#39;inflation_rate\u0026#39;] = unemp_inf_monthly[\u0026#39;inflation_rate\u0026#39;].shift(-3).dropna() lagged_unempinf_monthly[\u0026#39;unemployment_rate\u0026#39;].corr(lagged_unempinf_monthly[\u0026#39;inflation_rate\u0026#39;]) np.float64(0.04095703974523833) A weak positive correlation. Let\u0026rsquo;s try generalising this function.\nlags = range(24) # two year range lag_correlations = {} lagged_unempinf_monthly = unemp_inf_monthly.copy() for lag in lags: lagged_unempinf_monthly[\u0026#39;inflation_rate\u0026#39;] = unemp_inf_monthly[\u0026#39;inflation_rate\u0026#39;].shift(-lag) lag_correlations[lag] = lagged_unempinf_monthly[\u0026#39;unemployment_rate\u0026#39;].corr(lagged_unempinf_monthly[\u0026#39;inflation_rate\u0026#39;]) pd.DataFrame.from_dict(lag_correlations, orient=\u0026#39;index\u0026#39;, columns=[\u0026#39;correlation\u0026#39;]) correlation 0 0.053951 1 0.048228 2 0.044783 3 0.040957 4 0.040465 5 0.042552 6 0.046893 7 0.054609 8 0.063986 9 0.075217 10 0.088234 11 0.102423 12 0.115654 13 0.124849 14 0.133213 15 0.141117 16 0.148507 17 0.155698 18 0.162488 19 0.166092 20 0.167879 21 0.168585 22 0.167955 23 0.166077 The positive correlation seems to grow by lag - meaning an increase in unemployment is correlated with an increase in inflation, generally, 1-24 months down the road. This seems counterintuitive.\nOne possible reason for this is that we average over the entire 1948-2026 period. Let\u0026rsquo;s try breaking it down into different economic periods.\neras = { \u0026#39;pre_stagflation\u0026#39;: (\u0026#39;1960-01-01\u0026#39;, \u0026#39;1969-12-01\u0026#39;), \u0026#39;stagflation\u0026#39;: (\u0026#39;1970-01-01\u0026#39;, \u0026#39;1983-12-01\u0026#39;), \u0026#39;great_moderation\u0026#39;:(\u0026#39;1984-01-01\u0026#39;, \u0026#39;2007-12-01\u0026#39;), \u0026#39;post_gfc\u0026#39;: (\u0026#39;2008-01-01\u0026#39;, \u0026#39;2026-02-01\u0026#39;), } lags = range(25) # two year range lag_correlations = [] index = 0 for era, periods in eras.items(): era_data = unemp_inf_monthly[periods[0]:periods[1]].dropna() for lag in lags: lag_correlations.append([era, lag, era_data[\u0026#39;unemployment_rate\u0026#39;].corr(era_data[\u0026#39;inflation_rate\u0026#39;].shift(-lag))]) index += 1 lag_correlations_df = pd.DataFrame(lag_correlations, columns=[\u0026#39;Era\u0026#39;, \u0026#39;Lag\u0026#39;, \u0026#39;Correlation\u0026#39;]) lag_correlations_df Era Lag Correlation 0 pre_stagflation 0 -0.807014 1 pre_stagflation 1 -0.816310 2 pre_stagflation 2 -0.827009 3 pre_stagflation 3 -0.837270 4 pre_stagflation 4 -0.847746 ... ... ... ... 95 post_gfc 20 0.076844 96 post_gfc 21 0.071699 97 post_gfc 22 0.063336 98 post_gfc 23 0.050598 99 post_gfc 24 0.027397 100 rows × 3 columns\nThis gets us the correlation between inflation and unemployment, by era, across time lags from 0 to 23 months apart. Let\u0026rsquo;s view this data era-by-era.\nsns.lineplot(x=lag_correlations_df[\u0026#39;Lag\u0026#39;], y=lag_correlations_df[\u0026#39;Correlation\u0026#39;], hue=lag_correlations_df[\u0026#39;Era\u0026#39;]) \u0026lt;Axes: xlabel='Lag', ylabel='Correlation'\u0026gt; Fig 7: Correlation b/w Unemployment and Inflation, by Lag and Era lag_correlations_df.groupby(\u0026#39;Era\u0026#39;).apply(lambda x : x.nsmallest(1, \u0026#39;Correlation\u0026#39;)) # The lowest correlation and its = lag by era Lag Correlation Era great_moderation 62 12 0.061450 post_gfc 75 0 -0.420974 pre_stagflation 8 8 -0.859855 stagflation 37 12 -0.408968 The lessons of this analysis are interesting. For one, we see that in all eras save for the great moderation, we have a lag that corresponds to a reasonable negative correlation between inflation and unemployment. Save for the post great recession era, the lag between which changes in unemployment are most negatively correlated with changes in inflation seems to be between 8 to 12 months.\nThe pre-stagflation era tells a simple story: unemployment is strongly negatively correlated with inflation at any lag, but especially around 8 months in.\nThe stagflation era does not turn out to be the empirical disprover it seemed to be: at a lag of 8 months, inflation and unemployment correlate with an n=-0.408968. In the social sciences, this is a moderately strong correlation.\nThe great moderation era is the only one where inflation and unemployment have no link at any lag. This era is associated with prudent monetary policy, most notably, with milder business cycles and generally less volatility in economic variables like inflation and unemployment.\nThe post-gfc eras \u0026lsquo;strongest\u0026rsquo; lag is 0 months. That is to say, changes in unemployment immediately affect inflation. This was also a complicated period: the main factors in this anomaly are likely the Great Recession and its aftermath, and COVID.\nWhat explains the GFC? # sns.lineplot(unemp_inf[eras[\u0026#39;post_gfc\u0026#39;][0]:eras[\u0026#39;post_gfc\u0026#39;][1]]) \u0026lt;Axes: xlabel='observation_date'\u0026gt; Fig 8: Unemployment and Inflation during the Great Recession and Aftermath Inflation and unemployment both started off high in the GFC era. As the economy plunged into recession, the economy suffered a collapse in employment and deflation - simultaneously. This is likely due to the special nature of the crisis that occurred in 2008.\nIn the aftermath and recovery, unemployment came down from its heights while inflation stayed stable. This is because the economy was operating below full employment, or at an unemployment gap. Closing this gap between current unemployment and the natural rate of unemployment should not generate inflationary pressure, and did not up till 2015, where we see a simultaneous increase in inflation and a continued decrease in unemployment past pre-crisis levels.\nThis decrease continued up till COVID. COVID was an especially unique crisis. The economy suffered highs in inflation you wouldn\u0026rsquo;t usually see during a period of recession, because of supply-side issues caused by the pandemic. There is a brief lag here, which probably reflects the time between lockdowns -\u0026gt; supply-side issues. The collapse in inflation and unemployment, and the lag between those too, can be put down to the end of lockdowns, (the resolution of) supply-side issues, and also fiscal stimulus.\nApplying OLS Regression # A linear regression involves fitting a curve to a bunch of data points to see the relationship betweeen two variables, or in our case unemployment and inflation. Invariably, there will be some level of error (or residual) in each prediction: the difference between our predicted value of y given x and the actual value of y in a given data-point of x. An OLS regression aims to minimise the sum of each such residual (squared, to avoid weighting negative and positive residuals differently) between our data points and the curve we\u0026rsquo;re fitting onto them.\nThe main numbers we care for are:\nR-squared, or how much of the variation in inflation is explained by unemployment coef, or slope, of unemployment_rate P\u0026gt;|t| of unemployment_rate, which measures whether the relationship is statistically significant (p\u0026lt;0.05) x = unemp_inf_monthly.unemployment_rate x = sm.add_constant(x) y = unemp_inf_monthly.inflation_rate result = sm.OLS(y, x).fit() print(result.summary()) OLS Regression Results ============================================================================== Dep. Variable: inflation_rate R-squared: 0.003 Model: OLS Adj. R-squared: 0.002 Method: Least Squares F-statistic: 2.729 Date: Fri, 03 Apr 2026 Prob (F-statistic): 0.0988 Time: 20:50:14 Log-Likelihood: -2315.4 No. Observations: 937 AIC: 4635. Df Residuals: 935 BIC: 4644. Df Model: 1 Covariance Type: nonrobust ===================================================================================== coef std err t P\u0026gt;|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 3.0046 0.325 9.245 0.000 2.367 3.642 unemployment_rate 0.0908 0.055 1.652 0.099 -0.017 0.199 ============================================================================== Omnibus: 210.433 Durbin-Watson: 0.025 Prob(Omnibus): 0.000 Jarque-Bera (JB): 414.374 Skew: 1.295 Prob(JB): 1.05e-90 Kurtosis: 4.977 Cond. No. 21.1 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. An r^2 of 0.003 tells us the same story as our earliest analysis: there is no correlation between those two variables. The slope is positive and P\u0026gt;|t| tells us it\u0026rsquo;s statistically insignificant. But what about by-era analysis?\nPre-Stagflation # x = unemp_inf_monthly[\u0026#39;1960-01-01\u0026#39;:\u0026#39;1969-12-01\u0026#39;].unemployment_rate x = sm.add_constant(x) y = unemp_inf_monthly[\u0026#39;1960-01-01\u0026#39;:\u0026#39;1969-12-01\u0026#39;].inflation_rate result = sm.OLS(y, x).fit() print(result.summary()) OLS Regression Results ============================================================================== Dep. Variable: inflation_rate R-squared: 0.651 Model: OLS Adj. R-squared: 0.648 Method: Least Squares F-statistic: 220.4 Date: Fri, 03 Apr 2026 Prob (F-statistic): 9.18e-29 Time: 20:50:14 Log-Likelihood: -152.39 No. Observations: 120 AIC: 308.8 Df Residuals: 118 BIC: 314.4 Df Model: 1 Covariance Type: nonrobust ===================================================================================== coef std err t P\u0026gt;|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 7.6044 0.364 20.905 0.000 6.884 8.325 unemployment_rate -1.1027 0.074 -14.845 0.000 -1.250 -0.956 ============================================================================== Omnibus: 10.070 Durbin-Watson: 0.120 Prob(Omnibus): 0.007 Jarque-Bera (JB): 10.964 Skew: 0.738 Prob(JB): 0.00416 Kurtosis: 2.888 Cond. No. 23.4 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. Here we see a strong correlation, as expected, an r^2 of 0.651. A negative slop of magnitute 1.1027, indicating a 1.1% fall in inflation with a 1% rise in unemployment. The relationship is highly statistically significant.\nStagflation # x = unemp_inf_monthly[\u0026#39;1970-01-01\u0026#39;:\u0026#39;1983-12-01\u0026#39;].unemployment_rate x = sm.add_constant(x) y = unemp_inf_monthly[\u0026#39;1970-01-01\u0026#39;:\u0026#39;1983-12-01\u0026#39;].inflation_rate result = sm.OLS(y, x).fit() print(result.summary()) OLS Regression Results ============================================================================== Dep. Variable: inflation_rate R-squared: 0.007 Model: OLS Adj. R-squared: 0.001 Method: Least Squares F-statistic: 1.124 Date: Fri, 03 Apr 2026 Prob (F-statistic): 0.291 Time: 20:50:14 Log-Likelihood: -432.89 No. Observations: 168 AIC: 869.8 Df Residuals: 166 BIC: 876.0 Df Model: 1 Covariance Type: nonrobust ===================================================================================== coef std err t P\u0026gt;|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 8.5759 1.098 7.811 0.000 6.408 10.744 unemployment_rate -0.1649 0.156 -1.060 0.291 -0.472 0.142 ============================================================================== Omnibus: 19.229 Durbin-Watson: 0.019 Prob(Omnibus): 0.000 Jarque-Bera (JB): 9.954 Skew: 0.415 Prob(JB): 0.00689 Kurtosis: 2.145 Cond. No. 32.0 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. R^2 of 0.007, a slop of -0.1649, and very high P-value mean this relationship is weak and statistically insignificant.\nGreat Moderation # x = unemp_inf_monthly[\u0026#39;1984-01-01\u0026#39;:\u0026#39;2007-12-01\u0026#39;].unemployment_rate x = sm.add_constant(x) y = unemp_inf_monthly[\u0026#39;1984-01-01\u0026#39;:\u0026#39;2007-12-01\u0026#39;].inflation_rate result = sm.OLS(y, x).fit() print(result.summary()) OLS Regression Results ============================================================================== Dep. Variable: inflation_rate R-squared: 0.025 Model: OLS Adj. R-squared: 0.022 Method: Least Squares F-statistic: 7.431 Date: Fri, 03 Apr 2026 Prob (F-statistic): 0.00681 Time: 20:50:14 Log-Likelihood: -427.18 No. Observations: 288 AIC: 858.4 Df Residuals: 286 BIC: 865.7 Df Model: 1 Covariance Type: nonrobust ===================================================================================== coef std err t P\u0026gt;|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 2.1690 0.350 6.189 0.000 1.479 2.859 unemployment_rate 0.1651 0.061 2.726 0.007 0.046 0.284 ============================================================================== Omnibus: 9.285 Durbin-Watson: 0.098 Prob(Omnibus): 0.010 Jarque-Bera (JB): 9.699 Skew: 0.448 Prob(JB): 0.00783 Kurtosis: 2.916 Cond. No. 33.1 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. R^2 of 0.025, and a positive slope and statistically significant relationship.\nPost-GFC # x = unemp_inf_monthly[\u0026#39;2008-01-01\u0026#39;:\u0026#39;2026-02-01\u0026#39;].unemployment_rate x = sm.add_constant(x) y = unemp_inf_monthly[\u0026#39;2008-01-01\u0026#39;:\u0026#39;2026-02-01\u0026#39;].inflation_rate result = sm.OLS(y, x).fit() print(result.summary()) OLS Regression Results ============================================================================== Dep. Variable: inflation_rate R-squared: 0.177 Model: OLS Adj. R-squared: 0.173 Method: Least Squares F-statistic: 46.31 Date: Fri, 03 Apr 2026 Prob (F-statistic): 9.89e-11 Time: 20:50:14 Log-Likelihood: -435.97 No. Observations: 217 AIC: 875.9 Df Residuals: 215 BIC: 882.7 Df Model: 1 Covariance Type: nonrobust ===================================================================================== coef std err t P\u0026gt;|t| [0.025 0.975] ------------------------------------------------------------------------------------- const 4.7076 0.348 13.545 0.000 4.023 5.393 unemployment_rate -0.3749 0.055 -6.805 0.000 -0.484 -0.266 ============================================================================== Omnibus: 32.834 Durbin-Watson: 0.068 Prob(Omnibus): 0.000 Jarque-Bera (JB): 42.705 Skew: 1.009 Prob(JB): 5.33e-10 Kurtosis: 3.809 Cond. No. 18.2 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. A weak R^2 of 0.177, negative slope and high statistical significance. In this era, a 1% increase in unemployment resulted in a 0.37% decrease in inflation.\nOur data uses a monthly time-series. Inflation and unemployment data does not change much month from month (\u0026lsquo;autocorrelation\u0026rsquo;). OLS assumes independence, or that each data point of inflation and unemployment is unrelated to the previous data-point; this is not true of our data. We can see this in the very low Durbin-Watson values. Autocorrelation in OLS regressions tends to lead to inflated R^2 and P|t| values, both overrating the link between our variables and statistical significance. In sum, these results are not precise.\nConclusion # In this notebook, we started by looking at US economic data generally to measure the relationship between unemployment and inflation. We found that, in the aggregate, no such relationship seemed to exist. We then broke down our data into four eras and looked at them separately, enabling a more tailored analysis. We started by simply measuring correlation by era, then added a lag too. Finally, we tried out an OLS regression (with limitations) by era. Our dataset was limited to the US exclusively, and did not vary; we did vary technique, each with its own limitations (see their sections). These limitations should be taken into account, but we can derive some conclusions with reasonable confidence.\nFor the pre-stagflation era, there was a very tight negative correlation between inflation and unemployment. This peaked at 8 months of lag - a positive change in unemployment one month correlates heavily to a negative change in inflation eight months on. Our OLS regression was not lagged, but it also gave us a slope of -1.1: a 1.1% fall in inflation leads to a 1% rise in unemployment. For the stagflation era, the relationship broke down. Our OLS regression found a low R^2 and low statistical significance. However, adding a 12 month lag gave us a moderately high correlation of -0.408968, indicating the relationship existed but was quite lagged. For the great moderation era, we found a statistically significant positive correlation between inflation and unemployment. This relationship was interesting, and is explainable quite simply: the Federal Reserve was looking at the same data as us. Under Volcker, the Fed worked rigorously to keep inflation under control - as a result, any increase would result in a tightening of policy, and increases in unemployment, and vice versa, perpetuating the positive correlation. The post-GFC era got its own section in this notebook. The gist of it is that the noise in the data is because of two major events: The GFC itself, or its recovery to be exact. In the slow recovery that followed, the economy took its time returning to its NRU. In the meanwhile, inflation stayed stable and unemployment steadily decreased, which obviously weakens the relationship between our variables. COVID was a unique event; it resulted in both a demand and supply shock. With a brief period of lag, unemployment rose and so too did inflation, also complicating our data. But the general idea is that the Phillips Curve should not simply be dismissed. Testing macroeconomic hypotheses is always complex: there actually isn\u0026rsquo;t that much data, given how many factors change, country-to-country, year-to-year. Overall, though, the tight relationship in the pre-stagflation era and the fact we can explain away all the eras save for the stagflation should give us some confidence. On the other hand, in recent times the Phillips Curve has been on a losing streak - maybe its time is simply over.\n","date":"3 April 2026","externalUrl":null,"permalink":"/posts/testing-phillips-curve/","section":"Posts","summary":"","title":"Testing the Phillips Curve","type":"posts"},{"content":"A notebook that models and analyses my personal finances for my first-year as an incoming University of Toronto, Faculty of Arts and Sciences student. Modeling on Pandas rather than a spreadsheet offers a few benefits on top of providing practice: it enables the application of some advanced statistical techniques not easily accessible in Excel, some advanced visualisation libraries, and so on. I have not utilised most of these features yet. I intend to extend this analysis further with a Monte Carlo simulation (as opposed to scenario based modeling) in the future, for instance.\nStructure # The notebook consists of cost, revenue, and net dataframes for base, pessimistic, and optimistic forecasts. In total, there are 9 such, \u0026lsquo;core\u0026rsquo; dataframes. Costs are categorised largely according to the UofT Financial Planner. So too are revenues. I estimated OSAP and UTAPS grants and loans, and zeroed scholarships and family support for simplicities sake. My main revenues aside from these are: summer work, expressed as a starting balance added to my running balance in the net dataframes; part-time work, expressed as a constant revenue source. More on these later.\nNet Running Balance Date 2026-09-01 1144.0 6424.0 2026-10-01 1922.0 8346.0 2026-11-01 -1078.0 7268.0 2026-12-01 -1078.0 6190.0 2027-01-01 -1000.0 5190.0 2027-02-01 -1078.0 4112.0 2027-03-01 -1078.0 3034.0 2027-04-01 -1078.0 1956.0 Table 1.1: Net Dataframe for Pessimistic Scenario Fig 1.1: Net Monthly Revenue by Scenario After inputting all the relevant data and consolidating it into the three main models, I got to the core of my analysis: the use of aggregate metrics to analyse my forecast finances. I judged my finances by the following metrics:\nFinal balance Minimum balance Maximum deficit Worst month Months negative Required buffer Final Balance Minimum Balance Max Deficit Worst Month Months Negative Required Buffer Base 13948.0 11927.0 -151.0 2026-11-01 00:00:00 0 0 Optimistic 22140.0 15415.0 521.0 2026-11-01 00:00:00 0 0 Pessimistic 1956.0 1956.0 -1078.0 2026-11-01 00:00:00 0 0 Table 1.2: Metrics by Scenario I also performed a sensitivity analysis on the two main variable revenues in my budget, part-time and summer work. The intention was to determine how much I could bend these variables and stay in the green. I visualised this in heatmaps, and a feasible frontier curve.\nFig 1.2: Heatmap of Minimum Balance by Summer and Part-time Earnings Fig 1.3: Heatmap of Months Negative by Summer and Part-time Earnings Part-time Earnings Summer Earnings 0 0 6500 1 100 5500 2 200 5000 3 300 4000 4 400 3000 5 500 2500 6 600 1500 7 700 1000 8 800 0 9 900 0 10 1000 0 Table 1.3: Feasible Frontier The final aspect of this notebook was a residence cost, and then weighted, comparison. For the former, I manually input the costs for each residence available to me to see how they would mesh with my broader budget.\nResidence Minimum Balance Final Balance Months Negative Max Deficit 3 New College 2406 2406.0 0 -3014.5 0 Woodsworth 1956 1956.0 0 -1078.0 2 Knox 1475 1475.0 0 -2183.0 6 University College 823 823.0 0 -3806.0 1 Chestnut -2559 -2560.0 4 -3797.0 5 Trinity -5196 -5197.0 4 -6816.0 4 Oak -6962 -6963.0 4 -7699.0 Table 1.4: Key Metrics by Residence After that, I had three LLMs (Claude 4.6, GPT-5.3, and Deepseek) rank the (first four) viable residences across a few similar categories. I got the following results:\nCost Room \u0026amp; Amenities Community Food Location Woodsworth 2 1 3 1 3 Knox 3 4 4 3 4 UC 4 3 1 4 1 New College 1 2 2 2 2 Table 1.5: LLM Rankings of Residences by Category I weighted these categories according to my preferences, and then derived a ranking of residences.\nWeighted Ranking Woodsworth 1.55 New College 1.95 UC 2.75 Knox 3.75 Table 1.6: Weighted Ranking of Residences Analysis # What lessons did I derive from this? I got a viable ranking of residences, which I have used in my StarRez application. OSAP and UTAPS applications ask the applicant for estimates regarding, among other things, their starting assets and work earnings expectations throughout the school year. I have a model that can inform the values I give when asked for such details. Not to mention that I have an idea of how hard I\u0026rsquo;ll need to actually work, this summer and the school year, in order to stay afloat, which is extremely useful. I also have a nice budget template, something I can refine based on my actual data even better in my second year.\nGenerally, my finances are viable so long as I earn either $1000 per month during the school year, save up $6500 this summer, or (in the middle) save $2500 and earn $500/month during the school year. This is all according to my pessimistic cost forecasts, so this is a worst case analysis.\nIt does, however, use OSAP and UTAPS estimates that are not guaranteed. I will have to update these with actual figures once I\u0026rsquo;ve applied. Moreover, it treats OSAP loans as if they were revenue - loans are liabilities.\n","date":"29 March 2026","externalUrl":null,"permalink":"/posts/modeling-first-year-finances/","section":"Posts","summary":"","title":"Modeling my first-year finances","type":"posts"},{"content":"","externalUrl":null,"permalink":"/authors/","section":"Authors","summary":"","title":"Authors","type":"authors"},{"content":"","externalUrl":null,"permalink":"/categories/","section":"Categories","summary":"","title":"Categories","type":"categories"},{"content":" Title Description References MCQambridge A web app that helps automate solving and checking Cambridge O/A-level and IGCSE MCQ past papers. github Modeling First-Year Finances A jupyter notebook that models my personal finances for my first year at UofT, containing many cool metrics with immediate decision-making utility. notebook summary Empirical Economics A series of data-driven economic analyses using Python and pandas. page ","externalUrl":null,"permalink":"/projects/","section":"","summary":"","title":"Projects","type":"page"},{"content":"","externalUrl":null,"permalink":"/tags/","section":"Tags","summary":"","title":"Tags","type":"tags"}]