Chemists generally depict molecules based on Lewis structures, a concept that has flowed into the computer representations used since the 1970s and 80s. These representations still form the basic records for most databases containing chemical information, which have been used to construct predictive models using neural networks and machine learning since the end of the 1980s, as were various group contribution methods involving some regression algorithm.
The science community is now being challenged with the hype of Big Data and new methods of ML/AI, such as deep learning, that often go against the traditional validation concepts of the cheminformatics community. Chemistry is still entering the digital era and as a discipline it is confronted with a problem not found in some of the more commonplace applications of ML/AI; for example, large e-commerce companies, real time analysis and simulations in the automobile industry and data from large spectrometers and colliders such as the LHC.
The central element in chemistry, i.e. the basic identifier/model for a molecule is no more than metadata, which contain the apparent connectivity between atoms in a formalistic manner of single, double, and triple bonds. Furthermore, chemistry data are not generally measured directly in real time and its findings are often subject to human selection/interpretation based on a predetermined bonding model. The distinction between measurements and the experimentalists’ interpretation of the measurements is far less clear in chemistry than in many other branches of science and engineering. For instance, 13C chemical shifts were for years believed to be equivalent to net atomic charges, a concept without physical reality.