Atom C++ Mac

  

The holidays are here again, and usually they just come and go without much memory. This Thanksgiving for our family was special for several reasons. This is the first Thanksgiving my wife and I have had as new parents, my wife got to come home for the first time in five months riding in a car, and we got to spend it at our house with the family. Shawnda got to spend time in the nursery with the twins, and when it came down to dinner, she ate the best I have seen in a long time. Since that day, my wife has really been making some big strides. She is eating well again after several weeks, and working hard at therapy. She went to the doctor last week to have a procedure performed to help with extending her left arm, and it was extremely successful. She is working on keeping it good and stretched on her own. She is also sitting up well in the wheelchair that she uses considering that it is made for someone much bigger than she is. Next week, she will be going back to acute rehab, and they are going to fit her for a wheelchair that fits. I am still shocked at how well she remembers names. I still have to ask her who someone is because I am terrible about remembering things. In general, she has had a phenomenal attitude during the past five months. I am so proud of her for staying in good spirits, and for working so hard. I really can't wait until she is back home so that she can spend some time really loving on the babies (and me.) And we can do the same in kind.

  1. Atom C++ 実行 Mac
  2. Cmac Transportation

From Thanksgiving 2008

CMac.ws Directory Address information, maps, driving directions and links for 15,000,000+ businesses in the USA! Subscribe to: Post Comments (Atom) About Me. Atom helps you write code faster with a smart and flexible autocomplete. File system browser Easily browse and open a single file, a whole project, or multiple projects in one window. Oct 05, 2016 ArubaOS-CX 10.02 on Intel Atom C2538 Expand. TDES-CMAC Expand. ArubaOS-CX 10.02 on Intel Xeon D-1518 Expand. TDES-CMAC Expand. ArubaOS-CX 10.02 on Intel Xeon D-1527.

前提・実現したいことここに質問の内容を詳しく書いてください。(例)PHP(CakePHP)で なシステムを作っています。 な機能を実装中に以下のエラーメッセージが発生しました。 発生している問題・エラーメッセージMacのAtomでscanf関数を使おうと思って実行してみましたが、何の反応も起き. Download Atom for Windows now from Softonic: 100% safe and virus free. More than 794 downloads this month. Download Atom latest version 2021.



The twins have been in Huntsville for a very long time. They caught their first cold a week before Thanksgiving, and I thought it would be best to limit their exposure to environments where they can catch anything more serious. My wife asks about them all the time, and I miss them being around because they are always doing something new and fascinating. The amazing thing about them is that when they caught a cold, instead of crying and being in a generally poor mood, they learned how to roll around. The only way I knew something was wrong is because their breathing was unusual, and the younger one had a temperature. They are getting really big now, and starting to really develop their own personalities. Madison is really fun loving; she smiles and talks to everything. Morgan on the other hand is really laid back, and likes to absorb her environment.


This Christmas, my gift will be two lovely children, and a beautiful wife. It is truly miraculous to see where we were five months ago to where God has delivered us to today. On June 19; I was worried that my wife may not survive, and whether our unborn children would survive. Today, I see two delightful children, and have the unique honor of being married to one of the most beautiful and courageous women around. She fought for the lives of three people, and they are all doing wonderful. I thank the Lord for watching over our family, and for us growing stronger as a couple, and adding to our family.
Well, I started back to work this week also. I can't believe it has been over two months since I had a chance to post anything. All is going well, and I hope that you and your family are blessed during this holiday season.

Clustering is an invaluable cheminformatics technique for subdividing a typically large compound collection into small groups of similar compounds. One of the advantages is that once clustered you can store the cluster identifiers and then refer to them later this is particularly valuable when dealing with very large datasets. This often used in the analysis of high-throughput screening results, or the analysis of virtual screening or docking studies.

In theory any molecular descriptor can be used in clustering, in practice the use of a large number of descriptors, in particular some of the more esoteric ones, can make it difficult for a medicinal chemist to understand why compounds might be in the same cluster. It is perhaps better to stick to molecular fingerprints based on structural fragments, this usually results in clusters that chemists can see are similar. For this work I've tried to stick with Morgan/Circular type descriptors, however since all the packages will have their own implementation they are probably not exactly the same. A number of clustering techniques are available but the most common are nearest neighbours, this method assigns a single compound to each of the desired number of clusters, ideally the most diverse selection. Next each compound remaining is assigned to the nearest cluster, the centroid of each cluster is recalculated as each new compound is assigned to the cluster.

I've looked at a number of options for clustering molecules from toolkits like RDKit to commercial applications such as Vortex and I've tried them with a variety of different sized data sets, one containing 789 molecules, one 150,000 molecules and a large set containing 4,400,000 molecules. The clustering tools often have many different options but I've tried to stick to the default values. All work was run on a 12-core MacPro desktop with 64GB of RAM with 1TB of internal SSCD storage. I'm sure there are other options but I've limited the work to those that have in built support for chemistry. I've also limited the investigation to applications that are run locally since I suspect most chemists won't want to run calculations on an external server.

RDKit

RDKit is an open source toolkit for cheminformatics, it has Core data structures and algorithms in C++ with a user friendly Python wrapper. It also integrates nicely with Jupyter Notebooks. The RDKit Cookbook contains tips for using the the Butina clustering algorithm D Butina, 'Unsupervised Database Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way to Cluster Small and Large Data Sets', JCICS, 39, 747-750 (1999) and I've implemented it in a Jupyter Notebook.

Now read in the molecules

This dataset contains < 1000 molecules, now generate fingerprints

Then cluster

For a modest dataset of <1000 molecules the clustering is almost instantaneous , we can then look at a specific cluster

AtomAtom C++ Mac

Using the larger dataset of 150,000 molecules, reading in the structures took 90 seconds and 3.8 GB RAM, adding the fingerprints took another 20 secs and took RAM usage up to 3.83GB. Clustering used only one core, and after 23 mins and consuming 80GB RAM the kernel died.

After chatting to Greg Landrum, he suggested if using a large set of molecules you wouldn't want to ever have all the molecules in memory, so it might be better to read the molecules and generate the fingerprints in a single step like this

Also for the clustering use

The reordering flag is documented as follows 'if this toggle is set, the number of neighbors is updated for the unassigned molecules after a new cluster is created such that always the molecule with the largest number of unassigned neighbors is selected as the next cluster center'.

Whilst this lowered the initial memory requirements, unfortunately once the clustering started the memory usage steadily increased and the kernel died.

Chemfp

Chemfp is a set of command-line tools and a Python library for fingerprint generation and high-performance similarity searches. Whilst the current version is commercial older versions are available for free and can be installed using PIP. However it seems you have to explicitly define the version to install

This installs a variety of command line tools.

The command-line tools are written in Python, with a C extension for better performance. The Python library has a large and well-documented public API which you can use from your own programs. The core functionality works with fingerprints as byte or hex strings, reads and writes fingerprint files, and performs similarity searches. Chemfp does not understand chemistry. Instead, it knows how to use RDKit, OEChem/OEGraphSim, and Open Babel to handle molecule I/O and to compute fingerprints for a molecule, and makes all of that work through a portable cross-toolkit API.

The help is incredibly well documented, for example to generate the fingerprint file we need to use rdkit2fps, typing

Generating the fingerprint file

Andrew Dalke has provided a python script to help with the clustering and give some information on timings etc. it is available from here https://www.macinchem.org/reviews/clustering/clusteringfiles/taylorbutina.zip

Running on the ApprovedDrugs 789 dataset with a similarity threshold of 0.6 was effectively instantaneous

The resulting cluster file can be opened in a text editor and has this format

With some Python scripting it should be straight-forward to edit the script to output to cluster numbers and combine them with the original sdf file that was used to generate the fingerprint file.

Running the same process with a larger dataset took an unexpectedly long time and I noticed that it was only using one core. Chatting with Andrew Dalke it was apparent that Clang, the 'LLVM native' C/C++/Objective-C compiler, under MacOSX, doesn't include OpenMP support. Chemfp uses OpenMP for parallelism.

Compiling Chemfp for OpenMP

To compile Chemfp for OpenMP you'll need to install gcc then use the CC environment variable to configure the build step to use that compiler. I've written up instructions for installing gcc (plus other things) here using Homebrew.

I had to first remove Chemfp

C atom macAtom C++ Mac

However, you'll have to remove your local pip wheel file. A 'wheel' file is the new(ish) Python distribution format. It can include pre-compiled code, including byte code. 'pip install' internally makes a wheel file, stores it in a cache, then installs from the wheel file. The 'pip uninstall' will remove the installation butnot the wheel file.

Then reinstall Chemfp

I then ran the same exercise with the 150,000 structure file, this time running at similarities of 0.8 and 0.6 to see how it might effect performance, generating the fingerprint file took very little time and the timings for the clustering are shown below.

took just under a minute for 0.8 threshold and made use of all available cores.

took just over a minute at 0.6, so a little slower. As before the cluster output was a text file with the same format.

I had some problems generating the fingerprint file for the 4.4 million structures, it is often a problem when a file is derived from mulitiple sources and quality control is not consistent.

I ended up tidying up the file using this script.

This gave me a data set in SMILES format of just over 4.3 million structures. I then regenerated the sdf file and used that to generate the fingerprints without incident. (Apparently you can add '--errors ignore' to the rdkit2fps command, I've not tested it).

Generating the fingerprint file for the 4.3 million structures took 35 mins, generating the clusters using a 0.8 similarity took 10 hours. The python script also outputs a useful summary.

Atom C++ Mac

DataWarrior

DataWarrior is an open-source data analysis package I've reviewed previously. The cluster algorithm implemented in DataWarrior is simple, reproducible, but computationally demanding. By default Datawarrior uses Fragfp fingerprints

FragFp is a substructure fragment dictionary based binary fingerprint similar to the MDL keys. It relies on a dictionary of 512 predefined structure fragments.

To get a better comparison with the other clustering algorithms I generated the SphereFp fingerprint.

Atom C++ 実行 Mac

SphereFp descriptor encodes circular spheres of atoms and bonds into a hashed binary fingerprint of 512 bits. From every atom in the molecule, DataWarrior constructs fragments of increasing size by including n layers of atom neighbours (n=1 to 5). These circular fragments are canonicalized considering aromaticity, but neglecting stereo configurations. From the canonical representation a hash code is generated, which is used to set the respective bit of the fingerprint.

One the dataset of 789 molecules the clustering was instantaneous, and it was very easy to browse through the results.

Reading in the 150,000 structure file took 20 seconds and consumed 550 MB of RAM, adding the descriptors took the RAM usage up to 1.14GB. Unfortunately when I tried to do the clustering I got this message.

MOE

MOE is a molecular modelling package from Chemical Computing Group. Whilst MOE does not include Morgan/circular type fingerprints there is a SVL script based on the work by Rogers et al DOI that can be used to generate them. One the small dataset of 789 structures the clustering was complete within 15 seconds. One the larger set of 150,000 molecules generation of the fingerprints took 2 mins and the subsequent clustering took (>40 hours and counting)

I did not attempt to cluster the 4.4 million structure file using MOE.

Vortex

Vortex is a high performance, interactive data visualization and analysis application, with native cheminformatics and bioinformatics intelligence built in. Reading in the 789 structure file and then clustering was instantaneous. Vortex provides k-means clustering as standard, and uses either atom-based similarity (called DotFPCA) hex-packed to 1024 bits. This is the Dotmatics implementation of Morgan/circular type fingerprints.

Working with the 150,000 structure file, again using the default options, took less than a minute to generate the fingerprints and complete the clustering.

Using the 4.4M structure file took a little longer, reading in the file took less than 3 mins and consumed 13GB of RAM. Generating the fingerprints took 18 mins. used 4 cores and took the RAM usage up to 16GB, the clustering then took a further 40 mins using available cores with the RAM usage rising to 17.5 GB. Two columns are added: a CLUSTER number, and a DISTANCE; the centroid has a distance of 0 and all other compounds vary in distance up to 1.

Cmac Transportation

With over 4 million structures it is not really practical to simply scroll through the table, but here are a couple of tools that might help with further analysis. A question that might then arise is “How many molecules belong to each cluster?”, the GenericClusterAnalysis script creates a new workspace containing two columns. The first containing the cluster number the second the count of occurrences for each cluster, for this 4.4 million structures it took a couple of seconds to run.

Alternatively you can use a variation on the duplicate check script to select the first example from each cluster, you can download it here http://macinchem.org/reviews/vortexscripts/ChoseOneFromClusters.vpy.zip. An alternative is to select the centroid of each cluster by modifying the script to perform an additional selection based on the DISTANCE value, the script can be downloaded here http://macinchem.org/reviews/vortexscripts/ChoseCentreFromClusters.vpy.zip.

Dan Ormsby at Dotmatics provided me with a couple of script snippets that expose the fingerprint generation code, it turns out that under the hood there are a wide variety of fingerprint options the default options are all pre-set for the typical medicinal chemist user. I used the 150,000 molecule set to explore the impact of varying a couple of the options. Changing the atom path length in the fingerprint from 2 to 10 had minimal impact on the time taken, all less than 1 minute. To improve efficiency when working with these fingerprints they are usually delivered into Vortex in a hex packed format, the default is 1024 bits. I also did the clustering using the unpacked fingerprints since this would represent the slowest method. Using the unpacked fingerprints the run took two minutes to complete.

Summary

For small data sets of a few thousand structures there are multiple options for clustering, from Open Source toolkits to sophisticated desktop applications. However as the dataset increases the computational demands increase, and support for the use of all available cores becomes essential. For very large datasets I've only found two options Chemfp and Vortex. I've only used the free version of Chemfp (the commercial version may have better performance) and whilst it copes with several million structures it is very considerably slower than Vortex. Vortex also has the advantage of having the nice graphical user interface and scripting tools to allow further exploration of the dataset.

Last Updated 22 July 2017