The ability to ask questions to your data set has always been an intriguing prospect. You will be surprised how easy it is to learn a local Bayesian model that can be used to interrogate your data set.
10 hours ago
With the rise of chatGPT-like models, it has become accessible for a broader audience to analyze your own data set and, so to speak, “ask questions”. Although this is great, such an approach has also disadvantages when using it as an analytical step in automated pipelines. This is especially the case when the outcome of models can have a significant impact. To maintain control and ensure results are accurate we can also use Bayesian inferences to talk to our data set. In this blog, we will go through the steps on how to learn a Bayesian model and apply do-calculus on the data science salary data set. I will demonstrate how to create a model that allows you to “ask questions” to your data set and maintain control. You will be surprised by the ease of creating such a model using the bnlearn library.
Extracting valuable insights from data sets is an ongoing challenge for data scientists and analysts. ChatGPT-like models have made it easier to interactively analyze data sets but at the same time, it can become less transparent and even unknown why choices are made. Relying on such black-box approaches is far from ideal in automated analytical pipelines. Creating transparent models is especially important when the outcome of a model is impactful on the actions that are taken.
The ability to communicate effectively with data sets has always been an intriguing prospect for researchers and practitioners alike.
In the next sections, I will first introduce the bnlearn library  on how to learn causal networks. Then I will demonstrate how to learn causal networks using a mixed data set, and how to apply do-calculus to effectively query the data set. Let’s see how Bayesian inference can help us to interact with our data sets!