Dimensional Modeling 101 - Purdue University

Dimensional Modeling 101 - Purdue University

COURSE ENROLLMENT MODEL AN AGENT BASED SOLUTION METHODOLOGY IAN PYTLARZ SENIOR DATA SCIENTIST SCOTT PU DATA SCIENTIST 1 Agent Based Model Overview A COMPLEX SYSTEM, EMULATING REALITY Agent Generation The essential building blocks of the model PACE Program Analysis for Completion Engine Use degree progress in the prediction Probability Generation Put together available data to determine what students may take Modeling Every piece coming together 2 Agent Generation THE ESSENTIAL BUILDING BLOCKS OF THE MODEL An agent based model needs agents, who will choose courses in the final model To determine our agents, a Markov Chain model designed by Enrollment Management is used to evaluate enrollments across the university Students in this model are probabilistic (perhaps 80% likely to be a CS major, 20% likely to be MECH), leading to

multiple weighted agents being created Markov Chain Model Agent Major BOAP Agent Weight Jim CS 02 0.8 Jim MECH 02 0.2 Jane CHM 04 0.9 Jane

ENGL 04 0.1 3 Agent Generation THE ESSENTIAL BUILDING BLOCKS OF THE MODEL Agents will eventually pick courses, but we dont know beforehand how many courses a student will choose An XGBoost model predicts the number of courses a student will choose based on a inputs similar to a grades model XGBoost Agent College BOAP # Courses Jim CS 02 5 Jim MECH

02 5 Jane CHM 04 4 Jane ENGL 04 4 4 PACE Curricular Structure Analysis PROGRAM ANALYSIS FOR COMPLETION ENGINE PACE allows us to produce curriculum in a data structure to enable automatic comparison and extraction of meaningful information, such as progress MGMT 5 How Does PACE Work? DISCIPLINES, BLOCKS, RULES & QUALIFIERS PACE works by parsing and compiling SCRIBE code

There is a data structure for each discipline, which is determined by a students program of study Discipline Blocks Qualifiers Rules Qualifiers Sub-rules * Qualifiers All of this information, once parsed, is saved into a database for further analysis and use 6 How Is PACE Used Here? DEGREE COMPLETION AS A PREDICTIVE VARIABLE Using PACE, the progress of each student through their discipline is calculated These progress percentages are binned into 8 bins to allow nulls to be filled in by classification These bins will be used later to help determine courses a student will pick in the model 7 Probability Generation DETERMINING WHAT STUDENTS WILL TAKE

Before the model gets going, we need to give our agent population choices to make Each student will get a set of probabilities for picking courses Probabilities are based on three criteria: Major & BOAP historical frequencies Major & years enrolled historical frequencies Discipline & PACE completion historical frequencies These three sets of probabilities produce different models weighted together by parameters the data scientist tunes 8 Probability Generation DETERMINING WHAT STUDENTS WILL TAKE Using historical course taking patterns, we create a set of probabilities for what students will take Major: CS - BOAP: 02 Course Freq Choice p(x s ) Include courses in the denominator for students who retook the class, but not who took the class and who did not re-take it CS190 100

500 0.1 7 CS191 100 500 0.1 7 CS180 100 400 0.2 2 This will help estimate retake rates MA161 50 350 0.1 2 We also do this for pre-requisites MA261 50

450 0.1 0 9 Probability Generation DETERMINING WHAT STUDENTS WILL TAKE Then, remove courses students cant take, based on pre-reqs, and normalize when assigning to an agent Major: CS - BOAP: 02 Course CS190 Agent: Jim Freq Choice p(x s ) 100 500 0.1 7 CS191 100 500 0.1 7 CS180

100 400 0.2 2 MA161 50 350 0.1 2 MA261 50 450 0.1 0 Jim Cant Take MA161 Course p(x) CS190 0.20 CS191 0.20

CS180 0.25 MA161 - MA261 0.11 ENGL106 0.14 CS240 0.10 10 Modeling BRINGING THE PIECES TOGETHER We now have sets of agents, each with a set of course probabilities The model is actually quite simple, with one model for each set of agents: For epoch in (1N): For agent in agent_data: For choice in (1agentnum_courses) Generate number [0,1] Choose course from agent_data Sum course enrollments to epoch Average epochcourse_enrollments

Jim CS Example Course Cume p(x) CS190 0.35 CS191 0.50 CS180 0.62 MA261 0.80 ENGL106 0.97 CS240 1.00 Mode l Rolls 0.7 11 Modeling DETERMINING WHAT STUDENTS WILL TAKE

Combine models for major/boap, major/year, & disc/compl. Weight: 0.3 Weight: 0.5 Weight: 0.2 Major & BOAP prob p(x) Course Disc & Comp. prob p(x) Course Major & Year prob Y/N Course CS190 0.20 CS190 0.40 CS190 0.25 CS191 0.20 CS191 0.10

CS191 0.25 CS180 0.25 CS180 0.15 CS180 0.2 MA161 - MA161 - MA161 - MA261 0.11 MA261 0.15 MA261 0.11

ENGL106 0.14 ENGL106 0.09 ENGL106 0.14 CS240 0.10 CS240 0.10 CS240 0.05 12 Course Enrollment Model COMPLETE BASIC OVERVIEW Course Data & Major/BOA P PACE Agent Generation Choice Probabilities

Model Course Avg Enroll Stdev Enroll CS190 102 5.43 CS191 98 3.74 Results 13 Methodological Benefits WHAT DOES THIS COMPLEXITY DO FOR US? Individual simulations allow for distributions of results By wrapping each simulation up separately, it allows us to see a distribution of how enrollments might be spread out For instance, we can put a standard deviation on each

prediction finding which ones we are the most confident in Allows us to take advantage, in the future, of extremely individualized data PACE completion is just one small part of what could be used entire curricular requirements could be built into demand 14 Next Steps & Known Issues HOW DO WE IMPROVE FURTHER? Known Issues Currently allow students to take a course more than once Proper joint probability was hard to solve, could cause issues if the model is used improperly True multi-part agent predictions are computationally difficult without further optimization Current projections done by using an actual prior population and tweaking it by sampling from that population This means that partial agents as designed cant truly exist in the system not without waiting 8 hours to calculate them, making usage and testing very difficult 15 Next Steps & Known Issues HOW DO WE IMPROVE FURTHER? Potential Improvements/Next Steps Include curricular requirements as a fourth plank of probability generation

Multiple majors are ignored, should be more nuanced Incoming students dont get any credits in the current model (fall problem) Will need another model to fill this gap Freshmen no longer freely choose courses, they are pre- registered. Modelling this system will improve freshmenheavy course predictions 16 Discussion HOW DO WE IMPROVE FURTHER? This methodology represents, currently, a foundation to be built upon Current inputs boil down to being very similar to inputs into a simpler model, resulting in only small improvements to the error rates in current predictions What other potential data could be added to take advantage of the nuance allowed by the methodology? We discussed a few examples we intend to try, but there are virtually unlimited potential data sources 17

Recently Viewed Presentations

  • Welcomes you to the Annual General Meeting 2013

    Welcomes you to the Annual General Meeting 2013

    The acgs. The Association whose members have in common working within a clinical genetic science environment. The Association for Clinical Genetic Science was established in December 2012 from a merger of the Association for Clinical Cytogenetics and the Clinical Molecular...
  • Single-tube Multiplex Real-time PCR Assay for Detection of ...

    Single-tube Multiplex Real-time PCR Assay for Detection of ...

    (2)ARMS PCR(Amplification Refractory Mutation Systems) High Specificity . High Sensitivity. TaqMAMA assay, is an method of combination of .
  • Osmium - WordPress.com

    Osmium - WordPress.com

    It is a hard white transition metal. ... It was in the 1800's Smithson Tennant a chemist was studying though another element called Platinum and looking through the element he notice another substance which later in 1804 he finally figure...
  • Section 8-3 - pendleton.k12.ky.us

    Section 8-3 - pendleton.k12.ky.us

    Section 8-3 The Reactions of Photosynthesis Photosynthesis Question While performing an experiment using green spinach leaves, Jamie identifies chlorophyll and pigments of several other colors. What is the most accurate conclusion Jamie can make about the pigment in spinach leaves?...
  • First Age of Empires, Classical Greece, Ancient Rome and ...

    First Age of Empires, Classical Greece, Ancient Rome and ...

    First Age of Empires, Classical Greece, Ancient Rome and Early Christianity Pre-AP Unit #3 - Chapters 4-6 Constantine (272-337 A.D.) / Pope Constantine - Emperor of the Roman Empire during the 4th century A.D. , most famous for proclaiming official...
  • Chapter 1

    Chapter 1

    Lamb, Hair, McDaniel 2010-2011 CHAPTER 18 Sales Promotion and Personal Selling
  • C15 VQM and Call Records - Cordell, Inc

    C15 VQM and Call Records - Cordell, Inc

    Arial Calibri Constantia Wingdings 2 Flow 1_Flow 2_Flow 3_Flow Cordell VQM C15 Call History Server Interface for AMA and Call Records Slide 2 Slide 3 QoS Slide 5 Filters Filter Results Reports Filter Examples Filter Examples - continued
  • + Unit 3: Lesson 3 Different Perspectives of

    + Unit 3: Lesson 3 Different Perspectives of

    Firstly, Boukreev explains his decision to descend from the summit before all the climbers had arrived. He believes that if he hadn't, he wouldn't have been able to go back later to help rescue those stuck in the storm. Secondly,...