Meet our 2022 PyMC Google Summer of Code Students#

Kunal Ghosh#

Project Name

Fast Exact Gaussian Processes

Mentors

photo of Kunal Ghosh

Bio

I am a fourth year PhD student in computer science and applied physics at Aalto University, Helsinki, Finland. My research involves developing novel machine learning solutions to challenges in computational materials science. I am broadly interested in generative modelling, materials science and deep learning. I also love teaching and have assisted in courses on Bayesian Data Analysis and Deep Learning at Aalto university.

Connecting

Learn more at Kunal’s GSOC blog.

  1. What motivated you to apply for GSoC with PyMC?

    During one of the cold and dark winter evenings in Helsinki I was chatting with some fellow PhD students about my future life plans. I wasn’t sure what exactly I wanted to do, but having written a few large pieces of software for my research projects I appreciated the importance of writing good quality code. I knew that Osvaldo Martin (one of the core PyMC devs) was working on PyMC and after one of our group meetings (Osvaldo was doing a Post-doc with my PhD supervisor Aki) I asked if there was a possibility to work on PyMC since it would be a good opportunity for me to learn good software engineering practices and also contribute to open source (I was quite an avid KDE user back in the day). One thing lead to another and I applied to GSoC under PyMC and now I’m here :)

  2. Why did you choose your specific project topic?

    I was looking for potential projects which I had some prior experience and background in. Since I have some prior experience with using Gaussian processes and have also implemented them from scratch before, fast exact Gaussian process was a natural choice! Reasonably familiar but with still some scope to learn.

  3. How did you get involved in open source software?

    I started quite early, initially as a Linux user in the early 2000s while I was in high school. During my undergraduate years, I setup a free software user group in our university and also organised workshops about free software for scientific work. Subsequently, I participated in KDE’s Season of KDE and then a GSoC for the OwnCloud project. But all through, I’ve just had a deep love and appreciation for free and open source software.

  4. What are you expecting or hoping to get out of your GSoC experience?

    I will implement fast exact Gaussian processes in PyMC and have the code ready to be committed to mainline, hopefully by the end of GSoC. But more importantly learn good software engineering practices and hopefully continue to be a long term contributor of PyMC.

  5. What are your career goals? How do you see the GSoC program moving you towards them?

    I love teaching and working on research problems. I would like to do a part-time post-doc and work as a researcher in the industry, prototyping new solutions to challenging problems in the industry. If there is any company out there reading this and wants to have a chat, don’t hesitate to get in touch with me ;). Apart from that, I enrolled in GSoC purely to learn better software engineering practices, I am sure that would be useful regardless of what I end up doing :)

Purna Chandra Mansingh#

Project Name

Increase Support for Batched Multivariate Distributions

Mentors

photo of Purna Chandra Mansingh

Bio

I am a final-year student at Hyderabad Central University pursuing a Master’s in Computer Applications. I’ve been working in the fields of data science and machine learning for the past year. I enjoy working on complex problems and am a technical instructor. In my spare time, I enjoy contributing what I’ve learned to open-source projects.

Connecting

Learn more at Purna’s GSOC blog.

  1. What motivated you to apply for GSoC with PyMC?

    GSoC is a place where I can not only apply my existing skills but also learn new ones. And the learning does not stop with technical knowledge. GSoC introduces me to a new paradigm for collaboratively developing code. Furthermore, GSoC is a platform that allows me to build on and hone my current skills, which motivates me to apply for GSoC.

  2. Why did you choose your specific project topic?

    I’m very interested in machine learning and discovered PyMC a while ago; I actually started contributing before I knew about GSoC. I later discovered GSoC and realized this project was a good fit for my skills.

  3. How did you get involved in open source software?

    As I didn’t have the time to actively contribute, I started by fixing minor bugs in libraries and tools I came across in general. I began by sending small pull requests to PyMC, Scikit-learn, Python, Pandas, and other libraries that I had been using while learning Machine Learning. I learned about GSoC project openings in the PyMC organization and applied for it.

  4. What are you expecting or hoping to get out of your GSoC experience?

    I’m hoping to interact and share ideas with some amazing people during the 12 weeks of GSoC. In the end, I hope to have made some wonderful friends from all over the world with whom I can talk about fun projects, get feedback on my code, and just about anything else. I’m hoping to meet nice, knowledgeable, and smart people who are all gathered in one place, united, and working toward a common goal.

  5. What are your career goals? How do you see the GSoC program moving you towards them?

    I want to pursue a career in software development and the GSoC experience will help me to gain the skills I need to design and implement large and highly optimized software.

Larry Dong#

Project Name

A PyMC Dirichlet Process Submodule via AePPL Enhancements

Mentors

photo of Larry Dong

Bio

I am a second year PhD student in biostatistics at the Dalla Lana School of Public Health at the University of Toronto in Toronto, Canada. My academic interests revolve around dynamic treatment regimes and Bayesian methods. I began my PhD during the pandemic which has allowed me to be immersed in open-source, particularly in the PyMC community. My first GSoC project entailed implementing a Dirichlet Process submodule for PyMC and I’m back for another GSoC to continue this project and to learn more about Aesara and AePPL.

Connecting

Learn more at Larry’s GSOC blog

  1. What motivated you to apply for GSoC with PyMC?

I initially found out about the possibility to do GSoC with PyMC by browsing Twitter in March 2020. It was my second remote semester of my first year of PhD program and it was tiring me out; I knew that I needed a change of scenery from my online studies. Contributing to open-source was not an idea that occurred to me before entering my PhD yet it was a very appealing one. I knew that it is a nice opportunity to learn, especially when it comes to programming and contributing to an established codebase, and interact with members of a community. I wrote a more in-depth blogpost about my experience starting a PhD remotely and discovering open-source via GSoC: https://larrydong.com/posts/2022-06-18-value-oss/.

  1. Why did you choose your specific project topic?

In summer of 2020, I attended an online summer school on Dirichlet Processes and I barely understood anything. However, they seemed interesting and I was somehow appealed by the method. As such, I decided to jump head first into implementing a functionality in PyMC for Dirichlet Processes given there was an opportunity.

  1. How did you get involved in open source software?

I started small, like everyone, by fixing typos and updating trivial things. Even just creating pull request sometimes took me many attempts. It was during GSoC that I really started getting more involved in open-source.

  1. What are you expecting or hoping to get out of your GSoC experience?

In terms of the project, I would like to have a Dirichlet Process submodule available in PyMC experimental and have a solid foundation of Aesara and AePPL to be a longer term contributor. However, probably a more important goal of mine would be to continue foster the nice and inclusive community that first welcomed me when I was going through a hard time.

  1. What are your career goals? How do you see the GSoC program moving you towards them?

My exact career goals are still yet to be determined, but GSoC has showed me that there is a world where my graduate education in statistics and programming skills would be very valuable in the real world. I would like to find a job following my graduation (fingers crossed) that makes use of such skills, but, frankly, I don’t know what exact career I’m preparing myself for. I guess an inherent beauty of contributing open-source is that I can perhaps discover future career prospects as I work on and enjoy my GSoC project!

Danh Phan#

Project Name

Multi-output Gaussian Processes in PyMC

Mentors

photo of Danh Phan

Bio

Hi, my name is Danh Phan, a PhD candidate at Monash University, Australia. My research focuses on Machine learning (Bayesian methods, choice models, tree-based, and deep neural networks) for intelligent transport systems. I have more than four years of experience working on different machine learning algorithms, and have published several papers in the machine learning field. Besides, I have worked as an instructor at Monash Data Fluency, where I teach hand-one workshops on Python, Git, and High-Performance Computing to research students and staff at Monash University. I have also been worked with Bayesian methods in PyMC for nearly two years.

Connecting

Learn more at Danh Phan’s blog.

  1. What motivated you to apply for GSoC with PyMC?

    My first experience working with Bayesian methods (Bayesian Networks, GLM) is learning from PyMC code examples and resources. The useful learning materials and excellent community support help me a lot in my journey to perform Bayesian analysis. With the great support from the PyMC dev team, I have recently contributed several pull requests (PRs) to PyMC and Aesara GitHub codebase. Moreover, I want to involve long-term with this community to learn and contribute along the way.

  2. Why did you choose your specific project topic?

    I am interested in applying Gaussian Processes to analyse real-world datasets, which have temporal and spatial features. In my current research topic, I have been working on the Multi-outputs Gaussian Processes (MOGPs) for generating people’s travel activity time. Thus, I would love to contribute to the PyMC library by adding MOGPs feature to the PyMC’s GP module.

  3. How did you get involved in open source software?

    One of my old friends told me that it is a good idea to contribute to open source, so I can contribute and learn along the way. As I had used PyMC for some time in my work and found it really valuable, I decided to contribute to the PyMC project. My first pull request on PyMC GitHub was creating a helper pm.draw() function to take draws for a given variable. It took quite a while for the PR to be merged, but I learned useful things like writing docstrings and test cases.

  4. What are you expecting or hoping to get out of your GSoC experience?

    My project aims to add support for multi-output Gaussian processes (GPs) in PyMC. The advantage of multi-output GPs is their capacity to simultaneously learn and infer many outputs which have the same source of uncertainty from inputs. This model provides a practical approach for various applications in different fields. Hence, the multi-output GPs feature would significantly extend the capabilities of PyMC GP module and benefit the PyMC community.

    I plan to incorporate a Linear Model of Coregionalization and a Hadamard Regression Model into PyMC GP module. This project is an excellent opportunity to sharpen my coding skills, including designing user APIs, writing classes, docstrings and tests, and notebook examples. Furthermore, I hope to have more good friends and learn more from my mentors and other PyMC devs. They are extremely supportive, and I feel lucky to be involved in this project.

  5. What are your career goals? How do you see the GSoC program moving you towards them?

    I want to be an effective Data Scientist who can develop data-driven products to solve real-world problems and help businesses make efficient science-based decisions. I see that the Bayesian method is an intuitive and practical way to solve various issues, especially the ones that need to account for uncertainty. Indeed we also need other machine learning methods, and it will depend on specific use cases.

    The GSoC project will allow me to learn more about Bayesian statistics, especially nonparametric models. This knowledge is valuable for developing various applications in different fields. In addition, I can improve my communication skills and the capability to work in a diverse and international team.

Shashank Kirtania#

Project Name

Creating Base Class for deployment of PyMC models

Mentors

Bio I am a final year student at Thapar Institue of Engineering and Technology pursuing my major in computer engineering. I have been working in the domain of Data Science for a couple of years. I have worked with a few projects in the domain of computer vision and got a chance to work with a team working on Bayesian modelling earlier this year. I love to explore the field of data science and implement things I have learnt on various projects.

Connecting

  1. What motivated you to apply for GSoC with PyMC?

    Initially, while in my first year, I heard about the open-source community, I did not understand how it worked, but with time the idea of open-source appealed to me. Contributing to large code bases was a very tough task for me at the beginning, I believe GSoC provides the right collaborative environment to contribute to such organizations.

  2. Why did you choose your specific project topic?

    Working with the deployment pipeline is a significant project, earlier, getting a chance to work with a team working on Bayesian modelling using PyMC, I knew why we needed this project. I felt this would be an excellent opportunity for me to work on the deployment of Bayesian models and understand the models in a better manner itself.

  3. How did you get involved in open source software?

    My first encounter to open-source software was in my first year of college. When I first heard about the Hacktoberfest, sadly I wasn’t skilled enough to contribute to any of the projects productively. With time as my skill improved, I got a chance to fix smaller beginner-friendly issues in PyMC itself. Later in the year, I got this opportunity to work on my first open-source project.

  4. What are you expecting or hoping to get out of your GSoC experience?

    I expect to complete my project and gain more experience deploying Bayesian models using PyMC. I have been working on my part to learn more about Bayesian modelling, and I am trying to understand the real-world use cases of the same. Hopefully by the end of GSoC experience I will be able to contribute more to projects related to the same.

  5. What are your career goals? How do you see the GSoC program moving you towards them?

    In future, I do plan to build a career in the field of data science and machine learning; working on this project with PyMC is helping me improve my skill set and contribute to open source projects which I wouldn’t be able to without the collaborative environment that GSoC provides.

Yann McLatchie#

Project Name

Projection predictive model selection for PyMC.

Mentors

photo of Yann

Bio

I am currently a Master’s student in Machine Learning, Data Science, and Artificial Intelligence at Aalto University in Finland. My primary interests are in the field of Bayesian statistics, and more recently Bayesian methods of model selection. Alongside my studies, I work in the Probabilistic Machine Learning research group at Aalto under Aki Vehtari’s supervision.

Connecting

  1. What motivated you to apply for GSoC with PyMC?

Osvaldo Martin (one of my mentors and core PyMC developer) was at Aalto last year on a post-doc in the same research group as me, and we had previously discussed the need for projection predictive model selection to be implemented in Python given the success it has had in R, and its strong theoretical support. As such, we came up with the idea of incorporating it into a GSoC project and hey presto!

  1. Why did you choose your specific project topic?

Over the past six months I have been researching projection predictive model selection as a research assistant. The opportunity to concretise my theoretical understanding into a Python package promises to deepen my practical understanding of the subject matter. I have also found the underlying theory hugely interesting, and embrace the opportunity to bring it to a wider community.

  1. How did you get involved in open source software?

My first foray into open source software was an attempt to build a small Python package for stochastic simulation. This was initially daunting, but I found the learning and development process immensely rewarding. So much so that I followed it up with small pull requests to packages such as Bambi and eventually found myself interested in delving deeper into open source development.

  1. What are you expecting or hoping to get out of your GSoC experience?

First and foremost I hope to produce Kulprit, a Python package bringing projection predictive model selection to Bambi, and in doing so also bringing the methodology to a wider audience. I am confident that going through the process of developing a package from start to finish will teach me more about not only Python programming, but also documentation and communication of the core theory as well as working as part of an open source development team.

  1. What are your career goals? How do you see the GSoC program moving you towards them?

I have loved working on research questions and implementing complex ideas in code, and would like to continue doing so in a research career. The GSoC program gives me the opportunity to turn important research questions into software and communicate them to an audience of practitioners.

Conor Hassan#

Project Name

Efficient Inference for Latent Gaussian Models

Mentors

Photo of Conor

Bio

I am a PhD researcher at QUT, under the supervision of Kerrie Mengersen. My research focuses on federated learning (distributed estimation because of privacy constraints) of latent Gaussian and differentially private generative models.

Connecting

  1. What motivated you to apply for GSoC with PyMC?

I have always wanted to contribute to a probabilistic programming environment! I chose PyMC primarily for four reasons: the community is welcoming; the community is active; Python; and I thought the changes made in V4 were super impressive and a sign of the community’s dedication.

  1. Why did you choose your specific project topic?

As part of my research, I work on developing new inference methods (for latent Gaussian models). Because of our research direction, we focus on techniques, including variational inference and INLA-like ideas. I have always been curious about improving methods for these models in something like PyMC, and then Dan Simpson’s blog popped up. Check it out!

  1. How did you get involved in open source software?

This is my first time contributing to open source. Time to give back a little!

  1. What are you expecting or hoping to get out of your GSoC experience?

I hope to contribute backend support that improves the efficiency of fitting latent Gaussian models in PyMC. These are the class of models that INLA fits so fast! But the INLA package does many intelligent things in the back that are unrelated to the specific inference method. These methods are what we will try to get into PyMC. Personally, what I hope to get out of this is a place in this community and, hopefully, relationships that stay past GSoC and hopefully lead to continued contributions to PyMC in the future!

  1. What are your career goals? How do you see the GSoC program moving you towards them?

I love my research at the moment and am hoping to continue doing research in developing new Bayesian and machine learning inference methods or look at applying techniques such as these to complex problems. Either way, open-source libraries like PyMC will play a large part in what I want to do for work in the future!

Nicoleta Spînu#

Project Name

Modelling and Forecasting Time-Series Gene Expressions Data in Mechanistic Toxicology

Mentors

Photo of Conor

Bio

A pharmacist by training with a PhD in computational toxicology interested in artificial intelligence and personalised medicine.

Connecting

  1. What motivated you to apply for GSoC with PyMC?

There were three main reasons: the topic of time-series, the mentors and the supportive community of developers, and the opportunity to have personal contributions to expand the use and the applicability domain of the PyMC.

  1. Why did you choose your specific project topic?

The topic of time series is quite neglected in the field of drug discovery, e.g., preclinical studies and clinical trials. Thus, the motivation was to learn more about time series analysis and state space models including the Kalman filter, but also, how implementation works and showcase how PyMC can be used to model temporal data.

  1. How did you get involved in open source software?

My first commit was during my PhD and I wanted it to not remain the only one! :) GSoC seems to be the perfect way for me to get involved in open source software and grasp what it actually is and hopefully, continue as a contributor to PyMC.

  1. What are you expecting or hoping to get out of your GSoC experience?

Improve and learn software engineering skills from implementations to testing, meet other scientists and developers interested in the topic of time-series, and ideally, solve at least one problem in chemical safety assessment using PyMC.

  1. What are your career goals? How do you see the GSoC program moving you towards them?

GSoC through PyMC will contribute to forming a solid foundation in probabilistic programming and modelling. These skills will also allow me to develop models that ultimately, will benefit the patients that is an aspiration of mine.