Making developing chemistry based systems easier - Tim Dudgeon (ChemAxon)

Uploaded by chemaxon on 15.12.2010

ok so firstly this is not a talk about IJC that comes
later this afternoon
this is a talk about software development for chemistry applications
it's a talk free geeks apologies to the chemists
basic premise is how to be able to make

ChemAxon based or more generally chemistry based applications easier
how to make the developments of those applications easier and faster
these are the people who
have been mostly involved with this the first three
come from Argentina and the fourth from Australia so
it is a pretty multinational

project this one, the aim
was to investigate how we could basically make web based
applications for chemistry
and how we could use the best of breed framework that often used in the
IT sector
and are not commonly used within
for chemistry applications
these frameworks
they basically do the heavy lifting for you
so you have to write much less codes
because the framework does the work for you
and the frameworks by using these frameworks gives you a chance to really get real code
rather than the copy paste code which we tend to use nowadays
and the other real benefits of using the framework is that we can insulate our selves
from the implementation details
and the the benefits as I see here would be that we can avoid the need for in our case
specialist JChem developer skills
and any
person who's familiar with developing web based applications or even non web based applications
should be able to create a chemistry based application
they wouldn't need to understand all the other details how JChem works to be
able to create that sort of application
so let's look at it at at at at the example of how things work typically stay here is the JSP
example that am sure you've all seen
just this point here
we've got on the screen the sketcher
were we draw our molecule and over here on the right
we've got a large number of structures search options that needs to be specified
and this page really shows the best and the worst of uh... of of of the breed
the sketcher is put together here typically with a JavaScript code and this is really
nice it's sort of the JavaScript prefers an API to how the sketcher should appear and behave
in the
in the browser
really all you have to do is to copy paste this bit of code re-program it and it will
work in your application
and when the end line of the implementation details of the sketcher changes
the API provided by the JavaScript will be updated in the ChemAxon library and everything
will still work
that's happened quite recently
it happened
was it six months ago in one of the Marvin upgrades
and if you're using this approach and everything worked
if you weren't using it
you have to go in and change your codes because of those underlined changes
so this quit a nice example to how you can encapsulate
the implementation details to some extent
this bit over here on the right is really pretty ugly
this is the JSP page with a tiny fragment of the JSP page
and you can see it looks
but looks like a JSP page looks really really ugly we are mixing up database access code
presentation coder and everything here
and this is hundreds of lines of code
to get this if you've had wanted to do another application that's what the status of options
you've got to copy and paste the whole codes
and it gets ugly it's not nice
the second example is if you look at the the DAO level so this is how you might persist
the structure
in the data base
in this case where updating the structure of this is that the method called for updating
the structure
and I'm not going to go into his in details opposing to Java codes
but ... people who have written this code will recognize that there's a got-cha and pretty
well every line in this code you've got to know what you're doing
the things that could probably come and
kick you in the painful bits if you're not
we don't really want to be having like this for the level of code
in order to get
our application working
we also don't want to have that level of knowledge to be able to write those applications
so there are various problems here
firstly we have to write tons and tons of codes
we do need specialist knowledge we've got to understand how Marvin works we've got
to understand how JChem works to be able to write these applications
and there are lots of traps
if you don't do it right you will be caught
because we're writing lots of codes we've got to do lots of testing
we probably don't do lots of testing so we have a whole there
also it is difficult to test these codes it's not easy to vigorously unit test
this sort of approach
and web application needs to change or migrate it's difficult to re-factor an upgrade some major
its a major task to have to deal with those sorts of things
this generally results in the fact that we're using copy and paste coding
rather than getting real code re-use
another problem here is we're strongly coupled
to the underlying chemistry engine in our case is probably JChem if we wanted
to JChem even from JChem Based to JChem cartridge
it would be quite a lot of re-factoring
to have to do that sort of change and to change from JChem Base system to a similar base systems
preferably the other way around
it would be a huge amount of work
so let's take a step back let's look at how Java web based systems have evolved over time
the first way to do things really was uh... when JDBC and JSP
started had to be available
that was very early in his in the days of Java

tend to be using an approach that is called sort of a model one architecture
and the JSP uh... example is the sort of classic example of this
and basically it end up with a code mess
you can write applications but very quickly they get unmaintainable
at you end up in a mess
quite quickly people realize that they're getting into a mess
and something called model two and also known as web model view controller came along
I simplified initially by the Struts framework
that improves the architecture significantly
but it still was far from far from ideal
there was sort of a configuration hell you have to write so much XML file
lot of duplication of stuff it wasn't nearly as productive as it should have been
but over time the model to architecture has improved their new framework such as Stripes
or Spring model view the controller
that make use of the improvements that have happened typically conventional configuration
which means just
put things in the right place they just work
you don't have to configure it it just works but because the using the right of convention
and drives which stands for don't repeat yourself
to configure everything ones
not multiple times
status couldn't provide us with lots of improvements that overtimes but that there are still some limitations
in these sort of frameworks
and then finally more recently there have been some more object oriented
frame works available
Google Web Tool kit and Wiki are good samples here
these are more object oriented in nature and and provide some further improvements
like improved re-factoring support of the place to generally more object oriented approach
but the world outside Java was also doing things not surprisingly
probably about was it about
four years ago also maybe Ruby and Ruby on Rails really really started to become very prominent
and it really set the new standard
Ruby on Rails was a complete application that handled your complete application
all the way from the data based tier up to the presentation tier
it make extensive use
of convention over configuration and DRY
and it was because of the nature of the Ruby language it was a lot more productive environment
you could create an application much faster than you could using
using older
approaches like what is been used in the Java world
and Ruby on Rails really took off and became very successful
others of older and more conventional languages such as Python and PHP
but it may be that old limit that may be
not that but not so fashionable nowadays
but they were certainly still doing things in a framework for those that again
had followed many of these principles as well
and importantly they've announced some newer frameworks that are appearing
to run on the Java virtual machine that I think are particularly interesting to look at
but notably Groovy and Grails
Scala and Lift
so these are the new languages
that run on the Java virtual machine
they're very strongly aligned with Java and they work
and really complement with Java very well so they have the same compliments
with the ChemAxon tools very well but they are a lot more productive
so what we don't want to do is to look at
various of these things
and see how we could use them in chemistry
one thing I'm sure
all developers will be aware
there's a huge amount of choice out there huge amount of web frameworks
the people could use to pretty well everything
whatever category XML, Web frameworks but components Ajax, Persistence
that's a huge list this is just a tiny fraction
this can be very confusing
choosing which frameworks to use
is a very arbitrary decision because there are so many off of it is difficult to
make a subjective
supposed subjective decision on this
we're not saying which is the best frameworks which has chosen ones that seem
to be seem to work well
and seem to have the right criteria for us
let's take a step back, let's think about the standard system this might
be a system we're building for
a business unit not a chemistry based
these sort of systems generally have very simple data
they handle things like bank accounts, purchase orders, catalogue items, addresses at airline flights
things like that
the simple numerical text data and they are relatively easy
to handle
and for these sorts of application the Persistence tier
this is how the data is saved into the database
how it is read from the data base
how it is queried
this has been largely automated
using Persistence framework
or object related mapping tools they handle all the data base operations and really if you are building one of these standard systems nowadays
you won't even consider doing that what is Persistence to yourself, you will rather use one of these Persistence frame works, such as hibernating to do that for you and to make your life a lot simpler
but chemistry software looks to me to be stuck
it's not really benefiting from these recent developments
it tends to be what I say lovingly hand written we write the low level codes our selves
and we love it but it takes us
lot of time of do it and it has all of these draw backs that I mentioned earlier
and this results in poor productivity
why is this? why doesn't chemistry do it the right way, why do we have to do
it in this
painful manner but can we solve problems
so that's really the scenario that we started to look at
and what we did is really an experiment we said okay it shouldn't be like
this let's see how we could do it
and see how we can improve things
Si whit these experiments can these problems be solved?
so we have to ask you why is chemistry difficult why can't we just treat molecules like a
purchase order
the reason for this is that the molecules are very complex
the CRUD operations this is the Persistence operations in the database
to create
retrieve updates and delete operations
at and most importantly the query
can not be expressed the simple SQL statements
because molecules are complex when we search in molecules we need to handle
chemical fingerprints we have to handle things like that we can't just do that
as of simple SQL statement
and basically we need a chemistry engine to help
the right chemistry engine is JChem
at least chemistry
this in this context we're talking about Marvin and JChem systems but what do the same
really would apply to whatever chemistry engine you are using
that typically
mostly JDBC or JSP based or maybe using some of these
frameworks after Struts in a more limited manner
they tend to be hand-written
they tend to use some of these more modern frameworks
in my experience they tend to use copy and paste codes with all the
disadvantages of that and they make very little use of components and re-use
of components
and the result is that they're difficult to understand they're difficult to upgrade
and they're difficult test
but does it have to be this way well that's what we wanted to investigate
we wanted to look at all other benefits of using languages other than Java Can Persistence frame work be used with JChem?
can we benefit from modern web frameworks?
can we get real code re-use?
and can we get real database and chemistry engine independence?
so can we just switch out
JChem and put into its place in JChem Cartridge
can we switch out Accord Cartridge and put into it's place the JChem Cartridge
with minimum changes to our application codes
so the aim was really to simplify building a chemistry system using some of these best of breed tools
this is the application we built it's not designed to be a very sophisticated application
it was a proof of principle
just to show that we could build this sort of application with these
frameworks is a very simple web based application
we import a number of vendor catalogues
into a simple data base relational database model there is a structure table
there is sample table so each structure has a number of samples
and each sample comes from a particular source the source will be the supplier
so it's a very simple application it's not supposed to be rocket science in terms of the application
is also not suppose to be sophisticated it's also not the ultimate in user experience
what we're trying to do is to sort out the technology side of things
and what we built would look like this from the architecture point of view
the different tiers of application but more importantly here on the right
the database we your we using we can just plug-in Oracle my-SQL or Derby probably other
databases as well
the Persistence framework we used was Hibernate
and the web framework we used was Wicket
by quite a day comes up to use the JChem had to do to the structure Persistence of wickets
Hibernate was optionally used in JChem to do the structure Persistence and Wicket was optionally used in Marvin to do the structure display
so the ideal situation would be
that the structures don't get in the way
we can look at this really for other tables the source on the sample table
cause these are just plain data tables
to simplify this I've got a suitable code here
if we'd like written these application using Groovy & Grails
and these were the main objects for the source class and for the sample class
this is all the codes you will have to write
you define properties you define the relationship between the source table and
the sample table
and then you can just write codes
that runs the query
sample finds by codes and it'll look up that code
this methods
in the Grails framework
is automatically generated for you
I haven't written that methods
it's generated dynamically in run time
based on convention that you're using find by codes
and it looks up the fact that is
find by means: I want to do a search code well let's see if I've got a field called sort by code
oh I have therefore
I'll run that as a query
to delete that's sample you just do sample "dot" delete
so in an ideal world
you end up with codes like this which is really really simple try to write really really simple
to use
but that doesn't handle all the structure tables
for structures we've got a problem because we're using JChem Based
JChem Base
is necessary because we want to be able to search we can't get rid of it other wise
we won't be able to search for our structures
and this a screen shots of the architecture for JChem Base for how it works
in a browser based environment
the issue is that we must use the JChem API
that involves using some classes which maybe familiar with developers connection handler
update handler JChem search
so we have to pass our queries through the JChem API but we also need
to use plain SQL so we need to generate SQL because these become filters
on the JChem search
also if we do things like creating tables then we need to go through the JChem API
again we can't do that purely with SQL
and so it's not obviously suited
to the SQL generation framework such as
as such as Hibernate
or JPI
can we solve that problem?
well...let's take a look at cartridge in fact before we do that
cartridge is a bit easier because we've
got function calls here
that basically encapsulate the JChem engine
so everything can be done with SQL and thats the real beauty of the cartridge
the issue is that it's not standard SQL we've got this JChem operators, JC formula,
JC compare, JC evaluate and things like that
cause this is non-standard SQL
but we can basically customize our SQL generator in our Hibernator
to generate the sort of SQL for us
so with cartridge it is a bit easier
if we were now extending our Groovy & Grails application
now to include a structure table
that was using the JChem Cartridge
it looks a bit like this
quite similar to what we saw before we see the fields the structure field the molecular formula
the molecular weight field
or we just have to use a little trick here to say that the molecular formula
is a formula
which is JC formula structure
when the sequence is generated it uses that formula
to retrieve the value for formula and molecular weight
so its only just a slight change to the basic scenario
so for cartridge we can solve this pretty nicely
it doesn't actually solve query we still have to do a bit more work in
to get a query status
but we could use named queries which specifies the query at a
SQL level
and because with the cartridge we can do everything with plain SQL
that provides a satisfactory solution for most cases
the problem with cartridge though is of course only suitable if you want to go Oracle
and it's only suitable if you are running the cartridge in Oracle
so can we have a more generic solution here
the question is can JChem Base be tamed
so that it can work with Hibernator or JPI
if we do this
can the developer work
with this in a way
that it is agnostic to the chemistry engine been used
so can we just switch out one chemistry engine and put in another chemistry engine
and non of our application codes needs to be changed
so how we do this this is the real geek section here for the next minute or two
we're doing this using the Hibernate event model
this is the Java doc for this particular package
and the whole lot of listeners
that you could attach to Hibernate
which basically you can use to intercept the cause to Hibernate
so either before or after Hibernate does something
you can put on an interceptor
and hook in your code there
the whole list of these
which basically lets you Hook things up to when structures are inserted, deleted, updated
or whatever
and you can hook in your own code to customize how that's going to happen
so what's
what we do is that we have created that something which we call the chemistry
and the bridge
what happens at the Persistence level
the user let's say wants to persist a structure in the data base
they've got a new structure and they want to save that in the data base as a new row
the Hibernate
framework in this case is the JPA framework
we hook in to these events
a bridge so the
bridge is notified that Hibernate is wanting to persist a structure
then that engine bridge is then pass it onto the implementation
and says a structures wants to be updated
or inserted
an then the engine implementation
persists that structure
in the way that is appropriate for that particular engines so in our case using the
JChem Base API
to see this slightly more in action
we are persisting
a structure here this is up to the hibernating level
so we just say entity manager persists this structure
so which I just save that structured into the data base
the pre-insert event listener picks this up
so this is the listener that let's you
hook in codes before
the structure
would be inserted into the data base table
the bridge
causes on pre-insert methods
and then passes that onto the particular engine that's been used
and the engine says... create structure
create structure uses the JChem Base API in this case update handler
to insert the structures into the JChem table
and when that's done
is passed back to the hibernating
layer which basically saves it and that will commit the transaction
that will then
commit the SQL
for the other data into the data base
so engine bridge looks a bit like this we just have intercepting calls for
on pre-insert, on pre-updates, on post-delete and on post-loads
and this just passes each of those onto the particular engine implementation
and this is what the engine looks like in this case this is the JChem Base engine
we have a method called create structure
and this codes looks almost identical
to the code that we saw earlier
because we are using the JChem API to persist it, but the differences is that
this code is now in the library
the developer is not having to write this code
all they're having to do
is what work at Persistence framework level
this implementation detail is completely hidden to the developer
and a different engine would
do the Persistence in a completely different manner
and so this is a structure entity in this case we'll do it in Java and not in Groovy
we got
fields for molecular weights for formula for getting the samples and things like that
it looks very standard
we can also define in this case a standardizer so that when the table is created we get the right standardization
most of the things can be configured with annotations
so we can define what the structure table is called and things like that and parameters for other stereo types
and how the stereo is handled and things like that
this is really the important bits all the rest is the geeky implementation details
this is what the developer would actually need to do to actually use this
we got a method let's say in our service tier
we want to create a structure
what we do we call an entity manager persist
for that structure
we're going to look up a structure based on its ID
the developer the only thing they have to write is entity manager find 0:25:02.920,0:25:06.280 so the developer now is working in this completely
agnostic manner not needing to know what the
underlying implementation details are
and to prove this one of the aspect is the plugability
hibernating isolates you from the underlined data base
so if you write the same hibernating code doesn't matter if you're using a my SQL data base
or an Oracle database
Hibernate handles the translation for you
in our chemistry engine now also isolates you from the underlined
chemistry implementation so we can now swap in and swap out our engines
so this ia the only configuration change that you need to make to change
from using JChem Base to using JChem Cartridge
you just need to change the name, the class name of
the chemistry engine in this particular configuration for just one
line of code
you've switched from using JChem Base to using JChem Cartridge
query is a bit more complex I've got a very simplified representation
of this here and I don't really have time
to go into
it in details
we've got to do a bit more work here
to enable query to work it's not trivial because of the way
we have pass the query through the JChem Base API but also we use SQL generation
to generate
a SQL filter that is passed onto
the JChem API
but the implementation basically deals with that
we have an object oriented representation of the query that you sketch in
the browser
that then gets inspected and translated into a form that can be executed by the particular
the results you get back is a hit list
of the IDs
it's not
fully loaded entities which would be a performance
you just get a list of your hits and then you can go back to the database retrieve
the actual real data of the structures or whatever for those particular lists
so it's really isn't hard here to go into querier in detail
but the engine implantation
support querying
as well as the basic persistent stuff
another aspect of this is were we using the Wicket framework for doing
the presentations tier
and the service tier
just to illustrate the benefits of using these sort of frameworks
let's say we wanted to expose certain
parts of this application as a web service
well this is the entire code that is needed to do this
firstly you would have to enable
restful web services at it's best level this is what the configuration change
needs to make
in the
container level
configuration file basically just to enable the web services
to expose the functionality in our service tier
this is the method in our service tier
we just have to provide the right annotations
but basically enabled the web service
and then to use it, here is a bit of Groovy codes
that we're actually calling the web service that's the URL

so web services we don't have to develop web services
all we need to do is turn them on really
which makes a huge difference

so just to show that this isn't all
smoke and mirrors
there is something behind it some real codes that runs
this is the application

so we've got a very basic way of browsing through the structures we can select
a structure here and see which vendors it's available from
most importantly we want to be able to search so we go onto the search tab here
and here we've got a familiar
two people
can specify the search type
we can
enter in our substructure
run it
and we get our search results back so all
of this is going
through this frameworks I've just described the Persistence framework is done by Hibernate
the implementation here is JChem Base engine
the data base that is been searched in this case
is a Derby data base
but all of those things are pluggable and switch-out able
as I said it's not designed to be a sophisticated application it's just a proof of principle
that the technology
does work
and is usable
also I should add the Wicket
components here for encapsulating the sketcher and for the options
so these components now could just be
re-used in other applications so we don't have to copy & paste the whole lucky code
that we saw before
this is just a component that you can include in your web page and at all other options and
that reappear
in your next application
so in summary
did we succeed well from the Persistence point of view yes we pretty clearly
did succeed here
the approach seems to be successful we have recreated a meaningful but yet simple web
that uses these JPI interfaces in a very clean manner
so any developer
who's familiar with using Hibernating or JPA
could develop these sort of applications
they wouldn't need to understand anything about JChem
or anything about Marvin
we've given
our selves some vendor agnosticism

that at least we switch between JChem Base and JChem cartridge
we haven't yet tried switching between other vendors engines that going to be more difficult
I think that it should be possible

that the
engine is pluggable
in the web tier we haven't really done anything near as much most of the work
and the experiments we've done
is off a Persistence tier the web tier is relatively limited
we've got a smaller amount of Wicket components like the one for the sketcher and the one for the search options
the reason
why we haven't done so much here is cause there are so many web frameworks
that's we don't really know whether we using the right one, I mean we like Wicket
but you might not
and each companies is probably going to have their own
favourite framework to use
and so is not probably worth our while one investing a lot of time in a
particular web framework here
unless there is some sort of consensus
that that is the one to use
many of you
so the components that we are generating some of this system is wrote to be shorter
while some of it is more medium term
the engine architecture implementation that is the JChem Base and the JChem cartridge
those are developed we're also generating a Groovy library
that basically encapsulates JChem
so if you're using Groovy and for any Java developer who hasn't
looked at Groovy are really do recommend you do it compliments Java really really nicely
in terms
of web components
we could
components for any number frameworks
standard JSP tanks the GSP for the Groovy & Grails framework
whatever we that we don't really know what the demands would be here which frameworks people
really want to use
if you got any particular suggestions or comments in this area
we would suddenly like to see them
so the components we could create and have
at least for Wickets are the sketcher components ones for displaying structures
so the table component that displays the search results
and the search options probably a few more and we can create those as well
and we're open to suggestions for what might be useful
what will be available first I should say these are experimental tools they're not ready
for a production environment yet
we need to do more work to make insulated usable in a production environment
you'd be interested
to find out whether you think that is something we should
be doing
we're glad to use those for maybe creating internal systems and maybe for consultancy
projects as time goes on
and the components may become of the available as extensions to Jchem if there is a demand
so it's
way quit similar to have web services provided at the moment as an extension to JChem
we might provide these frameworks extensions to JChem as well for customers wanting to
use the sort of thing
we're looking for active people who have experiences so I know
some people have started looking at this sort of things as well

we are
welcome to hear
for people who might be interested in taking this sort approach using these things
so that's it
thank you for your time
have we got time for any questions?
first of all thank you that's a great talk
and we have a question
so for the structure
annotation it looks like you had the table type annotation
says that some more classes you added to be able to create the JChem tables
you believe this the slide where we have annotation up here
that basically specify the name of the table the JChem table that
is been created
uh... we've got annotation here for the type of tabling in this case we're using a molecules table
though we could specify Markush table
or reactions table
so how yes we've got some um...
parts of the framework handles the DDLS aspects of this
Hibernate could create the tables for you
it's no good if it only creates the non-structure table for you
so we've got hope in here as well
so that it reads these annotations
and then it creates the JChem table
with those particular attributes of the right table types the right standardizer configuration
the right
stereo option things like that
so you start with an empty database
the Hibernate will create the all of those tables for you
and with the connection stuff what you integrated
just to confirm to the slide was kind of quick the connection that was been passed in
that was from
that was the active connection with the open transaction right so roll backs in everything would be
yes that's the connection I told you from the hibernating section
so it's done within the same transaction thet Hibernate is managing