MolGAN: Generative model for molecular graphs
Deep generative models for graph-structured data offer a new angle on the problem of chemical synthesis: by optimizing differentiable models that directly generate molecular graphs,I am writing about MolGAN, an implicit, likelihood free generative model for small molecular graphs .Here we will use generative adversarial networks (GANs) to operate directly on graph-structured data. We combine our approach with a reinforcement learning objective to encourage the generation of molecules with specific desired/particular chemical properties.
Dataset- In implementation we used the QM9 chemical database, we found that our model is capable of generating close to 100% valid compounds.and we used RDKIT library.
- Introduction – Generating new chemical compounds with desired properties is a challenging task with important applications such as
- (Schneider & Fechner, 2005) Drug Discovery.Most works in this area make use of a so-called SMILES representation(what is smile representation of molecules.
- I have explained in my previous blog)
- a string-based representation derived from molecular graphs. String-based representations of molecules, however, have certain disadvantages: RNNs have to spend capacity on learning both the syntactic rules and the order ambiguity of the representation. Our molecular GAN (MolGAN) model (outlined in Figure 1) is to address the generation of graph-structured data in the context of molecular synthesis using GANs. The generative model of MolGAN predicts discrete graph structure at once (i.e., non sequentially) for computational efficiency, although sequential variants are possible in general. MolGAN further utilizes a permutation-invariant discriminator and reward network (for RL-based optimization towards desired chemical properties) based on graphconvolutionlayers(GCN) link…………that both operate directly on graph structured representations.

A vector z is sampled from a prior and passed to the generator which outputs the graph representation of a molecule. The discriminator classifies whether the molecular graph comes from the generator or the dataset. The reward network tries to estimate the reward for the chemical properties of a particular molecule provided by an external software.
Our goal is-obtaining a generator that can output valid molecular structures with good properties.
Molecules as graphs - The SMILES syntax, is not robust to small changes or mistakes, which can result in the generation of invalid or drastically different structures. We consider that each molecule can be represented by an undirected graph G with a set of edges E and nodes V. Each atom corresponds to a node vi ∈ V that is associated with a T-dimensional one-hot vector Xi , indicating the type of the atom. We further represent each atomic bond as an edge (vi , vj ) ∈ E associated with a bond type y ∈ {1, …, Y }. For a molecular graph with N nodes, we can summarize this representation in a node feature matrix X = [X1,…, XN],T ∈ RN×T and an adjacency tensor A ∈ RN×N×Y where Aij ∈ RYis a one-hot vector indicating the type of the edge between i and j.
Generative adversarial networks - GANs are implicit generative models in the sense that they allow for inference of model parameters without requiring one to specify a likelihood. A GAN consist of two main components: a generative model G(θ) , that learns a map from a prior to the data distribution to sample new data-points, and a discriminative model Dφ, that learns to classify whether samples came from the data distribution rather than from G(θ).Those two models are implemented as neural networks and trained simultaneously with stochastic gradient descent (SGD).G(θ) and Dφ have different objectives, and they can be seen as two players in a minimax game min θ max φ Ex∼P(data) (x) [log Dφ(x)]+ Ez∼Pz(z) [log(1 − Dφ(G(θ) (z))] , where G(θ) tries to generate samples to fool the discriminator and Dφ tries to differentiate samples correctly. To prevent undesired behaviour such as mode collapse (Salimans et al., 2016) and to stabilize learning, we can use minibatch discrimination and improved WGAN an alternative and more stable GAN model that minimizes a better suited divergence.
Reinforcement Learning for Molecular design-
A GAN generator learns a transformation from a prior distribution to the data distribution.Thus, generated samples resemble data samples. However, in drug design methods, we are not only interested in generating chemically valid compounds, but we want them to have some useful property (e.g., to be easily synthesizable). Therefore, we also optimize the generation process towards some non-differentiable metrics using reinforcement learning. Deterministic policy gradients-Policy Gradient methods update the Probability distribution of actions so that actions with higher expected reward have a higher Probability value for an observed value. A Policy can be either Deterministic or Stochastic. A Deterministic Policy is policy that maps state to actions.you give it a state and the function returns an action to take.a deterministic policy is represented by µθ(s) = a which deterministically outputs an action A. Stochastic Policy-outputs a Probability distribution over actions.This is represented by πθ(s) = pθ(a|s) which is a parametric probability distribution in θ that selects a categorical action a conditioned on an environmental state s.REINFORCE in combination with a stochastic policy that models graph generation as a set of categorical choices (actions). However, we found that it converged poorly due to the high dimensional action space when generating graphs at once. We instead base our method on a deterministic policy gradient algorithm which is known to perform well in high-dimensional action spaces .In our case
Action-Generation of Molecule,
Environment/Reward-Biochemical Evaluation of Molecule
Policy-Generative Model
Model - The MolGAN architecture consists of three main components: a generator G(θ), a discriminator Dφ and a reward network Rˆ ψ.

From left: the generator takes a sample from a prior distribution and generates a dense adjacency tensor A and an annotation matrix X. Subsequently, sparse and discrete ˜ A and ˜ X are obtained from A and X respectively via categorical sampling. The combination of ˜ A and ˜ X represents an annotated molecular graph which corresponds to a specific chemical compound. Finally, the graph is processed by both the discriminator and reward networks that are invariant to node order permutations and based on Relational-GCN layers.
Nodes and edges of G are associated with annotations denoting atom type and bond type respectively.
The discriminator takes both samples from the dataset and the generator and learns to distinguish them. Both G(θ) and Dφ are trained using GAN such that the generator learns to match the empirical distribution and eventually outputs valid molecules. The reward network is used to approximate the reward function of a sample and optimize molecule generation towards non-differentiable metrics using reinforcement learning. Dataset and generated samples are inputs of Rˆψ, but, differently from the discriminator, it assigns scores to them (e.g., how likely the generated molecule is to be soluble in water). The reward network learns to assign a reward to each molecule to match a score provided by an external software1 . Notice that, when MolGAN outputs a non-valid molecule, it is not possible to assign a reward since the graph is not even a compound.Thus, for invalid molecular graphs, we assign zero rewards. Generator - Gφ(z) takes D-dimensional vectors z ∈ RD sampled from a standard normal distribution z ∼ N (0, I) and outputs graphs. it is feasible to generate graphs of small size by using an RNN-based generative model.We, for simplicity, utilize a generative model that predicts the entire graph at once using a simple multi-layer perceptron (MLP).While this limits our study to graphs of a pre-chosen maximum size, we find that it is significantly faster and easier to optimize. We restrict the domain to graphs of a limited number of nodes and, for each z, G(θ) outputs two continuous and dense objects: X ∈ RN×T that defines atom types and A(N×N×Y) that defines bonds types. Both X and A have a probabilistic interpretation since each node and edge type is represented with probabilities of categorical distributions over types. To generate a molecule we obtain discrete, sparse objects X˜ and A˜ via categorical sampling from X and A, respectively. We overload notation and also represent samples from the dataset with binary X˜ and A˜. As this discretization process is non-differentiable, we explore three model variations to allow for gradient-based training:
We can i) use the continuous objects X and A directly during the forward pass (i.e., X˜ = X and A˜ = A),
ii) add Gumbel noise to X and A before passing them to Dφ and Rˆψ in order to make the generation stochastic while still forwarding continuous objects (i.e., X˜ij = Xij + Gumbel(µ = 0, β = 1) and A˜ = Aijy + Gumbel(µ = 0, β = 1)),
iii) use a straight through gradient based on categorical re-parameterization with the Gumbel-Softmax , taht is we use a sample form a categorical distribution during the forward pass (i.e., X˜i= Cat(Xi) and A˜ij = Cat(Aij )) and the continuous relaxed values (i.e., the original X and A) in the backward pass.
Discriminator and reward network- Both the discriminator Dφ and the reward network Rˆψ receive a graph as input, and they output a scalar value each. We choose the same architecture for both networks but do not share parameters between them. The generator output we will send to the GCN Network(for full understanding of GCN plz read this blog……………………………….).Then GCN output will go to the Discriminator and Reward Network.
A series of graph convolution layers convolve node signals X˜ using the graph adjacency tensor A˜. We base our model on Relational-GCN, a convolutional network for graphs with support for multiple edge types.How GCN works on graph data plz read……link. we employ a version of deep deterministic policy gradient (DDPG) ,an off-policy actor-critic algorithm that uses deterministic policy gradients to maximize an approximation of the expected future reward. In our case, the policy is the GAN generator G(θ) which takes a sample z for the prior as input, instead of an environmental state s, and it outputs a molecular graph as an action (a = G). Moreover, we do not model episodes, so there is no need to assess the quality of a state-action combination since it does only depend on the graph G. Therefore, we introduce a learnable and differentiable approximation of the reward function Rˆψ(G) that predicts the immediate reward, and we train it via a mean squared error objective based on the real reward provided by an external system (e.g., the synthesizability score of a molecule). Then, we train the generator maximizing the predicted reward via Rˆψ(G) which, being differentiable, provides a gradient to the policy towards the desired metric. Pytorch implementation-plz visit our Github Profile……link