BAYESIAN NETWORKS GT CSCE 521

Statement of the problem

Having a Bayesian network and the probabilities for each node, compute the result of any probability query, either marginal or conditional

Motivation ?

The network becomes harder to

solve each time a new node is added

Handmade computation will

become unfeasible at some point

We need a tool to automatize the process

and/or verify the results we get manually

Approaches

(1) Simulating a graph and solving the probabilities from top to bottom, using propagation

(2) Transferring the information in a SQL database and calculate the result of the query via a SELECT statement

I implemented the 2nd approach

Example - Creation of the tables

Equivalent in SQL, for Rain:

CREATE TABLE TableR (R boolean, C boolean, Prob number) INSERT INTO TableR VALUES (1, 0, 0) INSERT INTO TableR VALUES (0, 0, 1) INSERT INTO TableR VALUES (1, 1, 0.7) INSERT INTO TableR VALUES (0, 1, 0.3)

artificially inserted

Example – creating the queries P(C|G)= ? P ( C | G ) = P ( C, G ) / P ( G ) P ( C, G ) = ∑ P ( C ) * P ( x1 | C ) * P ( x2 | C ) * P ( G | x1, x2 ) where x1 ∈ { S, ¬S } and x2 ∈ { R, ¬R } P ( G ) = ∑ P ( y) * P ( x1 | y) * P ( x2 | y ) * P ( G | x1, x2 ) where y ∈ { C, ¬C }

Equivalent in SQL for P ( C, G ):

SELECT SUM(TableC.Prob * TableS.Prob * TableR.Prob * TableG.Prob) FROM TableC, TableS, TableR, TableG WHERE TableC.C = TableS.C AND TableC.C = TableR.C and TableG.S = TableS.S AND TableG.R = TableR.R and TableC.C=1 AND TableG.G=1

Data Format

Upload text files

Network description, each line has the format: [nodeName] [listOfPredecessors] [probabilities]

probabilities -> n! numbers (for n predecessors) starting with prob. if all predecessors are 0 to prob. if all are 1 (ascending)

One empty line

For the queries, each line has the format: [nodeName] [listOfConditionals] meaning: P(nodeName | listOfCond.] listOfConditionals can also be empty

Implementation

Python 3.3.2

SQLite for communicating with a tiny database

Tkinter for the graphical interface

Class Node – information about a node (name, predecessors) For each line read in the file: create a new instance of Node and a new table to store the probabilities For each query: split into 2 probabilities of joint distributions To solve a joint distribution: create 2 lists

Initial list – has all the nodes which appear in the joint distr., is useful for adding constraints in the WHERE clause

Extended list = init list + all nodes which are ancestors to elem. from init list, is useful for selecting the tables (FROM clause)

The linking between the tables is done using the predecessors field from class Node

Results & Demo

Questions?