BAYESIAN NETWORKS GT CSCE 521

Statement of the problem 

Having a Bayesian network and the probabilities for each node, compute the result of any probability query, either marginal or conditional

Motivation ? 

The network becomes harder to

solve each time a new node is added 

become unfeasible at some point 

We need a tool to automatize the process

and/or verify the results we get manually

Approaches 

(1) Simulating a graph and solving the probabilities from top to bottom, using propagation

(2) Transferring the information in a SQL database and calculate the result of the query via a SELECT statement

I implemented the 2nd approach

Example - Creation of the tables

Equivalent in SQL, for Rain:

CREATE TABLE TableR (R boolean, C boolean, Prob number) INSERT INTO TableR VALUES (1, 0, 0) INSERT INTO TableR VALUES (0, 0, 1) INSERT INTO TableR VALUES (1, 1, 0.7) INSERT INTO TableR VALUES (0, 1, 0.3)

artificially inserted

Example – creating the queries P(C|G)= ? P ( C | G ) = P ( C, G ) / P ( G ) P ( C, G ) = ∑ P ( C ) * P ( x1 | C ) * P ( x2 | C ) * P ( G | x1, x2 ) where x1 ∈ { S, ¬S } and x2 ∈ { R, ¬R } P ( G ) = ∑ P ( y) * P ( x1 | y) * P ( x2 | y ) * P ( G | x1, x2 ) where y ∈ { C, ¬C }

Equivalent in SQL for P ( C, G ):

SELECT SUM(TableC.Prob * TableS.Prob * TableR.Prob * TableG.Prob) FROM TableC, TableS, TableR, TableG WHERE TableC.C = TableS.C AND TableC.C = TableR.C and TableG.S = TableS.S AND TableG.R = TableR.R and TableC.C=1 AND TableG.G=1

Data Format 

Network description, each line has the format: [nodeName] [listOfPredecessors] [probabilities]

probabilities -> n! numbers (for n predecessors) starting with prob. if all predecessors are 0 to prob. if all are 1 (ascending) 

One empty line

For the queries, each line has the format: [nodeName] [listOfConditionals] meaning: P(nodeName | listOfCond.] listOfConditionals can also be empty

Implementation 

Python 3.3.2

SQLite for communicating with a tiny database

Tkinter for the graphical interface

Class Node – information about a node (name, predecessors) For each line read in the file: create a new instance of Node and a new table to store the probabilities For each query: split into 2 probabilities of joint distributions To solve a joint distribution: create 2 lists 

Initial list – has all the nodes which appear in the joint distr., is useful for adding constraints in the WHERE clause

Extended list = init list + all nodes which are ancestors to elem. from init list, is useful for selecting the tables (FROM clause)

The linking between the tables is done using the predecessors field from class Node

Results & Demo

Questions?

Bayesian networks