Scanning files takes way too long!

9 replies [Last post]
EagerEyes
User offline. Last seen 12 years 40 weeks ago. Offline
Joined: 07/10/2008
Posts:

It looks like you're parsing the files when doing the "scanning," which takes a lot longer than it should. Instead, it would make a lot more sense to only look for the molecule names and forget about the rest, and only parse the files when the user wants to see details/a 3D model. There is some sample code in the SDK that shows how to "hydrate" and "dehydrate" objects, it has to do with books ...

And speaking of that example, storing the most important information about each molecule in a database would also help make things faster, then the PDB file would only need to be parsed when the molecule has to be drawn. That saves a lot of memory, too.

Brad Larson
Brad Larson's picture
User offline. Last seen 4 years 22 weeks ago. Offline
Joined: 05/14/2008
Posts:

Agreed. What I have is a quick-and-dirty implementation.

I plan to parse the protein only once on download, then store the metadata like you say in an SQLite database. I may even buffer the vertex buffer objects used for the OpenGL rendering to allow for near-instantaneous rendering, if they don't take up too much disk space.

Again, this was not implemented due to the time constraints of getting it in the store on day one. I could have waited, but I wanted people to be able to play with something while I worked on improvements.

I should have the code up sometime today, so you'll be able to read through it and cringe at my makeshift implementation.

EagerEyes
User offline. Last seen 12 years 40 weeks ago. Offline
Joined: 07/10/2008
Posts:

I'm working on a visualization application that has some similarities (it's for categorical, multi-dimensional data, not molecules, though), so I've had to do something very similar. Storing the VBOs in the DB is a good idea, that would make it blazingly fast. I've found SQLite to be surprisingly quick, reading in 30,000 rows with a few fields takes only a small fraction of a second (haven't done any precise timing because it's simply fast enough for my purposes).

Are you going to put the project on sourceforge? I'd be interested in contributing, I've done quite a bit of molecule visualization for a product called LigandScout (most of the 3D graphics you see there are drawn by my code). That was in Java, and it's also been a few years, so I've forgotten quite a few things, but I should still be able to help.

Brad Larson
Brad Larson's picture
User offline. Last seen 4 years 22 weeks ago. Offline
Joined: 05/14/2008
Posts:

Wow, that's pretty good performance on the SQLite stores. That definitely sounds like the way to go.

I'm examining doing a hosted SVN repository here, coupled with a bug tracking and feature request system based on Drupal.org's Project module for Drupal. Unfortunately, that module isn't quite ready for Drupal 6 yet.

LigandScout is graphically stunning, and I'm sure that the hard parts (geometry calculations, possibly even OpenGL setup) would translate across well. A couple people have said that with a ribbon model visualization, this would become a serious scientific tool, and I've never done the graphics to something like that before, so any help you could provide would be greatly appreciated.

Unfortunately, we got another reminder yesterday that we're still under NDA and not to discuss code until they tell us, so I'm still awaiting approval to release the source in any form.

EagerEyes
User offline. Last seen 12 years 40 weeks ago. Offline
Joined: 07/10/2008
Posts:

I won't be able to contribute until about mid-August, so there's no rush (at least not from my side ;). I have to get out my own app and do a ton of other things. But I would love to take a crack at the ribbons.

Once the nucleic acids and their parts are identified, the calculations for the ribbons aren't particularly difficult. I used a few heuristics to avoid very short strands, but other than that, I remember it being fairly straightforward (though I did have help from an expert). It certainly took quite a bit of experimenting to get the parameters right, avoid 180º twists in the ribbon (which were a result of how the normal vectors of the "ribbon plane" were calculated), etc.

But for large molecules like proteins, I don't think it makes much sense to show individual atoms. When zooming in, they could be shown, but not as the default view. Perhaps as a bit of a stop-gap solution, it would make sense to include a number of small organic molecules that are a bit more interesting than H2O, but can also be shown nicely with point-and-stick?

jurgenfd
User offline. Last seen 12 years 40 weeks ago. Offline
Joined: 08/18/2008
Posts:

I'm new to my ipod touch.
When I start the 'scanning files' it returns to the main menu.
Does that mean I crashed Molecules?
I loaded PDB entry 1brv into the list maybe the parsing is having a problem with that.
It did it in the previous version and the version I got today in the exact same way.

Thanks for this great app!

Brad Larson
Brad Larson's picture
User offline. Last seen 4 years 22 weeks ago. Offline
Joined: 05/14/2008
Posts:

jurgenfd wrote:
I'm new to my ipod touch.
When I start the 'scanning files' it returns to the main menu.
Does that mean I crashed Molecules?
I loaded PDB entry 1brv into the list maybe the parsing is having a problem with that.
It did it in the previous version and the version I got today in the exact same way.

1BRV is an NMR structure with 50 overlapping models. Molecules in its current state is known to have problems with these types of structures, especially as they get above 10,000 atoms or so. That could be the source of your troubles.

I'm making good progress on replacing the backend data model with SQLite and one of the things that I'm adding is proper handling of these overlaid models. Hopefully, that will clear up these issues and remove a few of the other crashing bugs people are running into with large structures.

Brad Larson
Brad Larson's picture
User offline. Last seen 4 years 22 weeks ago. Offline
Joined: 05/14/2008
Posts:

OK, I just submitted a new version of Molecules for review that has had its data model completely rewritten to use SQLite: http://www.sunsetlakesoftware.com/2008/09/21/molecules-12-submitted-review . This should dramatically reduce the loading and rendering times after the initialization of the database on first launch.

Also, NMR models like 1BRV will only render their first structure and shouldn't jam up the workings any more. I'll be adding a means of selecting which structure to view in these models in the next version.

Thank you for your advice and your patience.

Anonymous

Molecules is just what im looking for, my brother birthday is close, and always get a headache triyng to get a good gift for him, thank to you, is possible now, he is a Pharmacobiological Chemist and will be excited with something like this. Maybe he dont like the ipods, but with a Application like this he will love it. Thank you.

Anonymous

very useful for me. Thanks!

Syndicate content