a medical university
Development of a new tool for healthcare company analysis
CMI Corporate Database
Jonas Lundberg Supervisor Associate Professor Carl Johan Sundberg
Master thesis in medical science with a major in biomedicine Centre for Medical Innovations Karolinska Institutet Sweden
SUMMARY....................................................................................................................................3 ABBREVIATIONS........................................................................................................................4 DEFINITIONS ...............................................................................................................................5 PURPOSE OF THE THESIS .......................................................................................................8 INTRODUCTION .........................................................................................................................9 The biotechnology industry in Sweden ...............................................................................9 Technology Transfer at Universities ...................................................................................9 A corporate database as an analytical tool.........................................................................10 Thesis disposition ..............................................................................................................11 MATERIALS AND METHODS ................................................................................................12 BUILDING A DATABASE ..............................................................................................................12 Data analysis ......................................................................................................................13 Database Integrity..............................................................................................................14 LEGAL ASPECTS ..........................................................................................................................15 CMI CORPORATE DATABASE 1.0 ...............................................................................................16 CMI CORPORATE DATABASE VERSION 2.0 .................................................................................19 What information do we need about companies?..................................................................19 Company information........................................................................................................19 Classification of companies ...............................................................................................20 Structure ................................................................................................................................22 Creating new forms................................................................................................................22 Creating a search function ....................................................................................................25 CMI CORPORATE DATABASE 2.0 - PROBLEMS AND SOLUTIONS .................................................27 The problems .........................................................................................................................27 The solution - creating a relational database........................................................................28 Relational databases...........................................................................................................28 Relations in CMI Corporate Database 3.0 .........................................................................28 A new search function .......................................................................................................29 Making analysis – exchanging information with Excel ....................................................32 RESULTS .....................................................................................................................................33 THE USEFUL TOOL – CMI CORPORATE DATABASE 3.0..............................................................33 User interface .....................................................................................................................33 ANALYSIS OF TWO EXAMPLES .....................................................................................................38 Companies owned by Karolinska Instutet .........................................................................38 R&D Biomedical companies in Sweden ...........................................................................41 DISCUSSION ...............................................................................................................................43 REFERENCES ............................................................................................................................46
Strategic decision-makers in government, universities and companies sometimes need to analyse the development of the biomedical industry in Sweden. For this purpose it is necessary to have a tool that facilitates storing and further analysis of data. The Technology Transfer units at the universities also need such a tool to be able to do an optimal job. This thesis describes the creation of a database tool at Karolinska Institutet. The development was done through three versions of the database, with the first two being based on a simple database model and the third and final of a more complex relational model. The conclusions of this project are: When creating a corporate database for analytical and practical use the following considerations should be made: 1. Analyse which other alternatives that exists than to create your own database. It is possible that it is simpler and cheaper to use a paper based system or an existing database. 2. Evaluate which data you need to store in the database. 3. Analyse how you will be able to make the database complete and valid (have integrity) and how to present information about database integrity to the users. 4. Make a detailed study of how the data is comprised to know how to store and split up the information in as small data units as possible. 5. Draw an information flow chart to understand how data will be treated and presented in the database. 6. Use a relational model for all complex databases and carefully plan what different tables that you need to have. Information shall only be stored once in a database. 7. Create the database structure before creating forms and reports for the user. 8. Create an open structure so that changes easily can be made later. It is difficult to know which information that users want to store and use in the future. 9. Make the user interface simple and stable so that novice users can perform desired tasks and prevent misuse of the database. 10. Evaluate the structure and the user interface continuously during the development by letting the end users use beta versions of the database. 11. Create a search function which automatically displays the different criteria that users can base their search on. 12. Store person particulars in a legal and ethical manner. 13. A database needs to be updated continuously. This requires a good administrator interface and also that there is a plan on how to finance the costs of updating of the database. CMI Corporate Database 3.0 is a useful tool for analysis of company analysis. It is complete regarding the companies owned by KI Holding AB and can be used for various types of analysis when more company information has been collected. It needs structural improvement by: ?? Splitting some data to smaller units (e.g. store names of persons as given and surname in two fields and not as the full name in one field) ?? Performing user tests make the right improvement so that the graphical user interface becomes more easy to use. ?? Introduction of different access levels to prevent misuse and to enable publication on the internet. ?? Automatically creating graphs to display various information in an easier way to the users. 3
KI Karolinska Institutet R&D Research and Development VC Venture Capital CMI Centre for Medicial Innovations
anaesthesiology (anaesthesiologist) The medical study of how to eliminate pain and sensation in people undergoing surgery and other medical procedures. An anesthesiologist is a doctor administers anaesthesia during surgery. attribute A name of a field (column header) in relational database terminology. bioinformatics The use of computers in solving information problems in the life sciences; mainly, it involves the creation of extensive electronic databases on genomes, protein sequences, etc. Secondarily, it involves techniques such as the three-dimensional modeling of biomolecules and biological systems. biomedicine A general term which means research done in the fields of biology, biochemistry, and chemistry for application in medicine. cancer A type of disease caused by cells that divide and grow uncontrollably, invading and disrupting other tissues and spreading to other areas of the body (metastasis). cardiovascular Relating to the heart and blood vessels (arteries, veins, and capillaries); the circulatory system. clinical trials The experimental administration of new drugs or medical therapies to human patients in tightly controlled settings, to find out if there are any unexpected harmful side effects, before making the new drug or therapy available to the general population. central nervous system (CNS) Refers to the brain and spinal cord. combobox A box in an access form where you can select or type in a value, see below.
domain The data type that describe the types and ranges of values that are allowed in each field.
endocrinology The scientific study of the function and pathology of the endocrine glands (e.g. the thyroid gland, pituitary gland, etc.) field types In Access you can store information in a number of formats. The most import in CMI Corporate Database are: Field type
Date/Time Hyperlink Memo Number Text Yes/No Appropriate uses Dates and times Text or combinations of text and numbers stored as text and used as hyperlink adress; example http://www.ki.se Long text and numerical strings Data that will be included in calculations Data containing text, a combination of text and numbers, or numbers that don’t need to be included in calculations. Fields that will contain one of two values (yes/no, true/false) Storage space 8 bytes 8 bytes 0 to 64,000 bytes 1,2,4 or 8 bytes depending on the field size selected 0 to 255 bytes 1 bit
genetics The study of the patterns of inheritance of specific traits. gynecology A branch of medicine dealing with the diagnosis and treatment of disorders affecting the female reproductive organs. immunology A subfield of biology that deals with the study of antigens and the immune process and how humans and higher animals fight off disease. infection Growth of a disease-causing organism within the body. list box A box the that displays a list of values where one or more selections can be made, see below.
A list box
medical imaging The use of MRI scans, x-rays, PET scans, CT scans, etc. to create pictures of various features of the human body to aid in medical diagnosis. 6
metabolism The sum of all of the enzyme-catalyzed reactions in living cells that transform organic molecules. molecular biology The study of the biochemical and molecular processes within cells, especially the processes of replication, transcription, and translation. nutrition The scientific study of "the sum of processes concerned in the growth, maintenance, and repair of the living body as a whole or of its constituent organs" [Graham Lusk, 1928]. These processes include ingestion and digestion of food, the conversion of food into chemical energy and other materials that the body can use or store, and excretion. The scope of the field also includes how these processes are different between organisms of different age, healthfulness, species, or environments. obstetrics A branch of medicine dealing with the care of women during pregnancy, childbirth, and the period during which they recover from childbirth. pain Pain is a sensation of suffering due to some form of stimulation of nerve endings. pathology The study of disease. pediatrics The medical study of the diagnosis and treatment of diseases and disorders in children. relation A table in relational database terminology. respiration A catabolic process that involves the exchange of oxygen and carbon dioxide between the body and the environment, the exchange of those gases within the body, and the oxidation of nutrient molecules for energy production. telemedicine Delivery of health services via remote telecommunications. This includes interactive consultative and diagnostic services. tuple A “row”, in relational database terminology urology A branch of medicine concerned with the diagnosis and treatment of diseases of the urinary tract and urogenital system.
Purpose of the thesis
The overall purpose of this thesis was to develop an analytical tool for investigation of health care companies with focus on individual biomedical companies as well as the biomedical industry as a whole. The analytical tool is intended to be used in e.g. academic studies of innovation processes in Sweden (or elsewhere) and academia-industry co-operations. With this tool it should also become possible to make various forms of analyses of the development of the corporate biomedical sector in Sweden. Decision-makers in the government, universities and companies have need for this type of analysis. Very much money is spent in the biomedical sector compared to the money being spent on research to understand the development of the sector. Further studies have to be performed in order to know how fast and why the biomedical sector is growing. Scientific studies have to be made so that we can understand the different factors that influence the development. These studies can then be the basis on which decision-makers base their future decisions. When performing these kinds of studies there is a need of a tool to store and analyse data. Today this tool doesn’t exist and the studies therefore are difficult to execute. The other purpose of this thesis was to create a tool to facilitate technology transfer between researchers at Karolinska Institutet and related companies. The matchmaking tool should support the search for companies interested in collaborations with Karolinska Institutet. The matchmaking task at Karolinska Institutet is today mainly performed by Centre for Medical Innovations (CMI), Karolinska Innovations AB (KIAB), KI Holding AB and Karolinska Investment Fund which all should benefit from the creation of a corporate database. However, the release of the database will be decided by CMI.
The biotechnology industry in Sweden
The biotechnology industry in Sweden can mainly be divided into two categories: The traditional sectors as the pharmaceutical industry, and the newer biotechnology companies that mainly have the established sector as their customers. Most Swedish pharmaceutical companies originate from the early 20th century. These companies produced pharmaceutical products based on ideas from abroad. It was first in the 1960s that the Swedish companies really started to invest in their own research, and during the 70s the Swedish companies had reached international levels. During the 80s their sales grew 16% per year and in the 1990s the sales grew with approximally 27% per year and had become one of the two most important high technology export sectors in Sweden (the other being telecom).1 The importance of the new biotechnology companies is growing. The industry is calculated to have had a global turn-over of $20 billion, in 1998, estimated to rise to $50 billion in 20052. In Europe the biotechnology industry has been smaller than in the US but is growing faster. According to the Swedish office of science and technology (STATT) the number of European biotechnology companies increased from 700 in 1996 to about 1400 in 1999. During the same period the number of US biotechnology companies increased from 960 in 1996 to about 1500.3 In 1999 Sweden was estimated to have 130 small innovative biotechnology companies which was the fourth highest in Europe after Germany, UK and France.4 Does the biotechnology sector grow very fast in Sweden? We do not have reliable data proving that, and in order to perform such studies the Centre for Medical Innovations (CMI) at Karolinska Institutet decided to develop a corporate database to use as an analytical tool to further study the biomedical field in Sweden.
Technology Transfer at Universities
“Technology Transfer is the mechanism by which societies try to ensure that publicly funded research discoveries are transferred to companies so that they can be developed and commercialized as products that benefit the public”. 5 Technology transfer is also the transfer of ideas in the other direction, from the industry to the academy. In the US the academic sector has for a long time exploited their research and this is one of the contributing factors to the strength of the US biotechnology sector. 6 In Sweden the industry-university collaboration has taken mainly six forms7: (i) research projects funded by an outside agent for commercial reasons; (ii) industry consulting by university personnel, doctoral studies hosted inside industrial labs etc; (iii) university employed Technology Transfer offices who act as mediators between university and small and medium size firms; (iv) research institutes and other organisations jointly run by universities and private industry; (v) the set up of firms for commercial exploitation of research; and (vi) financial and advisory aid to research-based firms and to individual researchers in order to facilitate patenting, licensing or direct commercial exploitation of knowledge and research results originating from universities. The Centre for Medical Innovations is trying to work with or facilitate all of the above. CMI Corporate Database is intended to facilitate this, by making it easy to find companies for research collaboration and organise the contacts with them. 9
So how important is technology transfer? It has been estimated that in the US technology transfer between universities and industry contributed $38 billion to the economy, creating over 300,000 jobs and forming of hundreds of new companies. Much of the evidence-based medicine today is driven by clinical trials sponsored by industry, with scientists setting the standards for excellence and objectivity8. It is very important for Karolinska Institutet to have professional technology transfer unit. This unit most likely needs a corporate database to be able to do an optimal job.
A corporate database as an analytical tool
Much money is invested today in research within the biomedical field. The reason for doing this except for the fascination of understanding new processes is that the society want something good to come out of it. It can be a new drug, a new drug delivery system, a new way of creating organs for transplantation etc. Most of these findings needs to be commercialised in order for them to able to reach the patients. The Swedish government and different companies in Sweden invest a lot of money in commercialising new biomedical findings. But are they succesful? Does the research really convert into new products that reach the patients? There is also an economic incentive for the state to invest money in research and in structures promoting their commercialisation (science parks, economic support for new companies etc). There is a general notion that the biomedical sector has a significant impact on the Swedish economy and that it will continue to grow. We know that the two big pharmaceutical companies in Sweden (AstraZeneca and Pharmacia Corporation) are important for the Swedish economy but they are no longer just Swedish companies, and we don’t know for how long they will stay in Sweden. The interest is instead focused on the small biomedical companies who might be the AstraZenecas of the future. But we don’t know if they are succesful or if the money is well invested, and if they aren’t what can we do to correct that? In order to be able to make correct decisions we have to have the information needed to answer questions like those mentioned in the previous paragraph. This information must be based on reliable and valid data. To make correct and efficient studies you have to collect the data in a structured manner and store them in a way so that the common data can be used in other studies. The most often used tool for this is by collecting the data in a database. The database must be carefully planned so that it can treat the data correctly and give valid and reliable data. The database should be easy too adminstrate and should contain functions for making analysis. If the data instead should be stored in a file or an excel document much of the work would have to be redone for each new study. The database should be structured in a way so that it can store different types of information. To study the development of the biomedical sector you need store a lot of data. You need financial information to be able to calculate the growth of the companies. The most easily available data is regarding number of employees and turn over. With that information you can estimate the total economic impact that the industry has on the society and how fast it is growing. You also need information about what the companies work with (therapeutic focus area, products etc) for studies of the development of different parts of the biomedical sectors. It is important to study the networks and collaborations between companies and universities. Further it can give information about the university based researchs impact on the biomedical industry. It can also give information about which universities, departments or individuals that are succesful in transfering their research 10
to the industry, and they can then be further studied so that you can understand why they are succesful and transfer that knowledge to other universities or departments. Since we don’t know which types or how many studies that will be done in the future the database must have a structure that is open for changes and where multiple studies can be performed. The importance of a good user interface and administration functions can not be underestimated. If the databased is difficult to use no one will use it and the whole work of setting it up has been pointless.
This thesis describes the process of creating CMI Corporate Database 3.0. It starts with Materials and Methods where the different aspects of building a database is discussed. The creation of the first two versions of the database is then shown and the different problems with them are discussed. The Results includes an illustration of CMI Corporate Database 3.0, with its function shown by two examples. The Discussion contains conclusions about how plann and set up a new database and discussion about problems with our database and what should have been done differently.
Materials and Methods
This project was executed on a FujitsuSiemens Celsius 400 computer*.
Building a Database
The first thing you should consider before building a database is: “Do I really have to build a database for solving my problem?”. You should always remember that a database is nothing but an organised collection of data that (hopefully) provides information to the user. There might be other solutions for your problem: Creating an Excel document, sorting your existing papers in a file using a logbook to record meetings with companies etc. One big problem with a paper based system is that the information most often just stay in the files and are never used. It is also difficult for people in other projects to get access to the files, if they even know that they exist. Another solution might be to use an existing database or software. They are often well developed and documented. Since many people and organisations use them it is often more cost efficient to use them than developing your own system. Bioprint† and R&R Medical Research Management‡ are two Swedish companies that sell reports and licences for searching their databases. When analysing if they could meet our need, we realised that neither of them could. First of all they do not store information about cooperations with Karolinska Institutet. Another reason for building our own database is that neither of them makes it possible to create new classifications and store information about a specific field that we want to investigate further (see Database integrity). This makes them impossible to use as research tools, since we can not make changes in their databases. The search function also includes too few specific search fields. We wanted to be able to analyse companies regarding number of employees in different years, geographical location etc. Instead they used keywords for most searches but this gives rise to the problem of delivering too “wide” search results (too many search results), neither of them did support exporting of data to Excel for further analysis and removal of “bad” results. If we should use these databases, complemented by a contact logbook, we could be able to manage our contacts with companies. But they were not good enough to use as a scientific analytical tool, and an analytical tool was what we wanted CMI Corporate Database to be. Recombinant Capital§ is a US company that stores information about many biomedical companies. Their database is probably the most comprehensive database available today. Unfortunatley they don’t contain so many Swedish companies. They do contain information about alliances between Karolinska Institutet and different companies but the information is just about a limited number of alliances (5 alliances 2001-04-27, compared to that just KI Holding is a joint owner of 28 companies), and we also want information about other sorts of cooperations than alliances e.g. if the company is started by KI researchers or if a KI researcher is a member of the bord. The conclusion of the search for alternatives is that we need to create our own database if want to have an analytical tool that we can update ourselves and that stores information about cooperations between KI and companies around the world.
800 MHz, 256 MB RAM, 10 GB HD www.bioprint.se ‡ www.bioseeker.com § www.recap.com
When planning a database, a good start is to make a schedule of the inputs and outputs of data that will be made in the database. In my work with creating a corporate database, the information flow may be illustrated as in fig.1.
Contacts with companies CMI Corporate Database Public information about companies Information about specific company Forms on screen Company info Reports Lists, labels etc
Finding companies with specific classification, financial information etc
Exporting data to Excel
Fig. 1. Information flow in CMI Corporate Database
The inputs will be made by the users (in form of information about contacts with company) and by the administrator (supporting information about the company gathered on the Internet and through contacts with the companies). The data is then stored in the database. The most often used function in the database will be to get information about, or record a performed contact with, a single company. The inputs should be made on screen, and the outputs can be presented on screen or in form of printed reports. The other big use of the database will be when performing advanced analysis of the data. The database should then contain a search function that can analyse which companies that fill the requirements of the search. This could for example be finding out how many companies that work with cancer and are started after 1990. Another interesting task for the database can be to analyse the geographical distribution of companies in specific areas or the chronological development of the biomedical sector (see Creating a search function / What information do we need about the companies). When the search is performed the data can be presented in the database or exported to Excel. If you for example just want to have more information about companies working with cancer in Stockholm you just perform a search and print out the results, but if you want to perform statistical analysis of the data this is better done after exporting the company information to Excel.
CMI Corporate database is supposed to be a tool for academic research. This requires that the database must be valid (guarantee that all false information is excluded from the database) and complete (guarantee that all true information is stored in the database). A database that that is both valid and complete is said to have integrity9. The validation process can be made in a number of ways10. In our database the information is supposed to be typed in manually by the system administrator. To confirm the data the companies are supposed look at it and verify the data. It’s also important to be able to store the dates/hour when the data was updated and in which project. You must also be able to record from which sources you get external information (for example Affärsdata corporate database). The user should also be able to choose only to get results that include companies that have confirmed their data. CMI Corporate Database will contain data with different importance of validation. An example: a company’s e-mail address (though important when contacting companies), doesn’t have to be validated when doing a study on the chronological development of cancer research companies, unlike data of the number of employees, the classification of a company as a cancer research company etc. The completeness of the data in the database is also of big importance when using the database as an analytical tool. As the field of biomedical companies is emerging rapidly it’s impossible to have it updated with all companies at all time. That requires the possibility of recording the latest date that the database was updated. I have tried to find as many companies as possible by using existing databases and also information from different trade fairs and books (see table 1). Since the database couldn’t be complete regarding all companies in Sweden that are related to the biomedical field I had to decide which part of the database that should be complete. You could for example choose to include all R&D biotechnology companies that work in the surgery field in Sweden and then really dig deep into that field and make sure that all companies are included in the database. In the CMI Corporate Database we decided that the first step (and the scope of this thesis) should be that the database should be complete regarding the 28 companies that are owned by Karolinska Holding AB (a holding company that is owned by Karolinska Institutet). We considered this to be manageable and that it would be possible interview the staff of KI Holding and having them validating the data. CMI Corporate Database should then have integrity for companies owned by KI Holding AB.
Table 1. Information sources for CMI Corporate Database Information type Database Book Book Exhibition catalogue Catalogue Catalogue Book Exhibition catalogue Exhibition catalogue Exhibition catalogue Interviews Name Affärsdata Nordiska Företag i Bioteknik Forskningsbaserat medicintekniskt företagande Medicin Scandinavia 2000 Hjälpmedelsöveriskten 2000 Röda Sidorna 2001 Den biomedicinska industrin i Sverige Biotech Forum 2000 Nordic Lab.xpo Pharm Expo Karolinska Holding AB (CEO Folke Meijer) Publisher Affärsdata i Sverige AB BioPrint 1997 NUTEK – B1993:12 Svenska Mässan Göteborg HSI (Hälso- & SjukvårdsInformation) HSI (Hälso- & SjukvårdsInformation) B1998:8 NUTEK Stockholm International Fairs Stockholm International Fairs Stockholm International Fairs
When evaluating the completeness of the database regarding all R&D biomedical companies in Sweden we compared it to existing reports and databases. In a report from Ernst&Young4 they estimated the number of biotech companies to about 140 in Sweden in 1999. R&R Medical Research Management has 239 companies in their biotechnology database, the Bioprint On-line database includes 500 nordic companies in the biotechnology field. These figures are interesting to compare with our total number of R&D biomedical companies in Sweden (see Discussion).
In Sweden, PUL (Personuppgiftslagen 1998:204) and Datainspektionen, sets very strict conditions about how personal particulars are allowed to be stored in a database. The purpose of the law is to protect individuals from that their integrity is violated through handling of their personal particulars11. Each database that deals with personal information must have a person that is responsible for the database. That person must make sure that12: a) Personal particulars are treated legally b) Personal particulars are always treated in a correct way and in accordance with good custom c) Personal particulars are only collected for certain, explicitly stated and legitimate purposes d) Personal particulars aren’t treated for any causes that are incompatible with that for which the information was collected e) The personal particulars that are treated are adequate and relevant in comparison to the aim of the treatment. f) No more personal particulars are treated than necessary with care taken to the aim of the treatment. g) The personal particulars used are correct, and if it’s necessary, current h) All reasonable measures are taken to correct, block or delete personal particulars that are incorrect or incomplete with care taken to the purpose of the treatment. i) Personal particulars aren’t stored for longer time than what is necessary with care taken to the purpose of the treatment. An important addition to what is mentioned in part d is that personal information is allowed to be stored and treated for historical, statistical or scientific purposes that are different from the causes for which the information was collected. Information for these purposes is also allowed to be stored for a longer period of time. If any action is taken regarding the registered person, this is only allowed with the approval of the individual or if there are very important reasons with care taken to the registered persons vital interests. For research and statistical use it is stated in 19§ PUL second paragraph that the personal particulars are allowed to be stored if the the research project has been approved by a research ethics committee. PUL and CMI Corporate Database In CMI Corporate database some personal particulars are beeing treated. They are being collected from websites, books and exhibition catalogues. All these sources of information are public and and the information about the person is not delicate (name, company, phone number compared to delicate information as religion, race, membership in labor union etc). Still the information needs to be treated carefully.
In the database information is stored about the employees of the companies. In each record where information is stored about a person there is a yes/no field where the administrator can select if the information should be shown to all users or not. A query is then based on this field so that only persons who have given their permission are shown to the users. The reason for storing the information before they have given their permission is to be able too send out a letter to the individuals asking for their permission to store and show their information in the database. If they say no their record is immediatly deleted. There is also the possiblity of using 19§ PUL to store information for research purposes. The researchers wanting to perform these studies then apply to the research ethics committe and if they approve the researcher can use database for storing the person particulars.
CMI Corporate Database 1.0
When I started the project I was given an existing database with information about 125 healthcare related companies. The company information was stored in 29 fields (see fig. 2)
Fig. 2. The first database contained 28 fields of information
Sixteen fields contained general information in text format (i.e. Company name, Postal Code, webbsite etc.). Five contained numerical information that could be used for analytical purposes (establishing year, turnover 1996 & 1999, employees 1996 & 1999). There were also eight Yes/No 16
fields for classification of the company. The classification was made from a NUTEK analysis13. The different classifications were:
Table 2. Classifications in the first database Classification R&D Medical equipment Disposables Handicap equipment Pharmaceutical, diagnostics Products, services etc Products and services, specific Description If they performed Research and Development. (Marketing subsidiaries in Sweden for international pharmaceutical companies were for instance not regarded as R&D companies) Based on the companies production of products “ “ “ Products and services that are used for biomedical research, clinical trials and registration of new pharmaceutical Products and services that are sold specifically for the healthcare industry. Example of company Astrazeneca, Medivir, Karo Bio Althin Medical, Artema medical, Gambro Atos Medical, HemoCue, Medi Team Cenova, Etac, Liko Abigo Medical, Astrazeneca, Biora, Karo Bio BiaCore, CMA Microdialysis, CyberGene Carmeda, GistBrocades, Pharmadule
In order to make the database useful as an analytical tool this classification was quite poor. It didn’t for example include any searchable information about which therapeutic area the companies worked with. We therefore had to produce a new classification system (see Classification of companies) This version of the database had a very limited use since we were able to store so little information about each company, so in the next version of the database the number of fields was increased (Company information) In the database I was given there was also a form to for updating, searching and printing information (fig. 3).
Fig. 3. Form update/search/print form
In the form you could easy update all information about the company. One big problem with this was that all users of the database were able to change the information. If you wanted to find a specific company you could do this by using the “Filter by form” function that is built in the Microsoft Access Software. The “Filter by Form-function” is a tool that creates a query based on the information that you type in the different fields in the form. This would be possible to use but gave me two problems. 1) I had to use the access menus and I wanted to create a “button interface” to prevent misuse of the database 2) I wanted present the classification criteria in form of a table and not in form of a click button for each option, and this doesn’t “filter-by-form” function allow. The “filter-by-form” function is not “reprogrammable” so I had to create a new search function (see Creating a search function)
CMI Corporate Database version 2.0
What information do we need about companies? Company information
The first version of CMI Corporate Database contained some general information about each company. In the 2nd version of the database we added information about adress to the company’s website, and also a field with information about a research contact at each company. We considered this necessary as many of the contacts we have with the company is not with the CEO of the company but instead with the head of research. In the 2nd version of the database the first information about cooperations with Karolinska Institutet was registered. There was possible to classify (and then search for) companies having a connection to Karolinska Institutet, with the following subgroups: Karolinska Holding being a joint owner of the company A former/present researcher at KI being the founder of the company A former/present researcher at KI being a member of the board A former/present researcher at KI being a member of the scientific board or advisory board The company has one or more research collaborations with KI A former KI researcher being employed by the company (preferably in senior management) You could then also store information about a contact person at Karolinska, with contact information. If there were many types of cooperation you could then save information about that in a memo field. In summary, the general information about each company in CMI Corporate Database version 2 was: Company name and adress Phone- and faxnumber E-mail adress and website Description of the company Establishing year Information about cooperations with Karolinska Institutet Contact person and research contact (including position and contact information) We also wanted to store more extensive financial information about the companies. The information collected was: Annual report Turnover 1996 and 1999 Number of employees 1996 and 1999 Owners Subsidaries and other investments (for VC companies) Stockmarket listing New in version 2.0 of the database was the possibility to store the Annual Report and which companies that e.g. a Venture Capital firm was investing in. 19
Classification of companies
Classifying biomedicial companies is a difficult task. We chose to classify them regarding therapeutic focus area and main type of product/service the company sells (Pharmaceuticals, contract research, production, consulting etc). The different therapeutic areas was chosen by looking at the different institutions and Karolinska Institutet and the different departments at Karolinska Hospitol. The focus area classification was then set to be the following 27: Classification (Focus areas) Anesthesiology / Intensiv care ** Bioinformatics / IT-products Cancer Cardiovascular Cell and Molecular Biology CNS/Pain Ear / Nose / Throat Endocrinology / Metabolism Eye Gastrointestinal Genetics Geriatrics Gynecology / Obstetrics Haematology Immunology / Infection Kidney / Urology Lung / Respiration Microbiology Nutrition Odontology Osteoporosis Other Pathology Pediatrics Psychiatry Skin / Wound healing Transplantation The area of bioinformatics and IT-products was then subdivided into the following areas:
Classification (Bioinformatics / IT-products) Functional Genomics Medical Imaging Protein Identification Proteomics Bankrupt
The misspelling of intesive care in this field was made in the creation of this version of the database. It was very diffucult to correct this within the structure of CMI Corporate Database 2.0. This problem and its solution is dealt with in chapter 6 – Problems and solutions for CMI Corporate Database 2.0.
The classification of products was made as follows: Classification (Products) Biomedical Knowledge Clinical Trials Consulting Contract Research Diagnostics Disposable Goods Financier Functional Foods Handicap Equipment Healthcare Provider IT-product IT-service Laboratory Product Legal advice Med. tech. Equipment Personell Letting Pharmaceuticals Patent advice e-healt companies were further classified into: Classification (e-health) Patient info. Portal Professional portal Software Telemedicine
This was a good start. But when in the creation of CMI Corporate Database 3.0 (see The useful tool CMI Corporate Database 3.0), we opened up for the administrator to add new classification areas. This was a must if we wanted the database to live on.
All the information in version 2.0 of CMI Corporate Database was stored in one single table (see fig. 4) This table contained 113 fields, where information was stored. Many of them were text fields where you can store 255 letters of information. The classification was stored in true/false fields and financial information in number fields. The description and contacts field where in the memo format. By setting the format to memo you can store more than 255 characters in the field. By storing all contact information about a company in one long memo it gets very difficult to sort and find old contacts. This was solved in the creation of the 3rd version of CMI Corporate Database (see CMI Corporate Database 2.0 - Problems and solutions)
Creating new forms
In order to make the database more user friendly I created a graphical user interface with the use of forms that contained text-, combo- and listboxes where the user could choose information and buttons to perform active commands. The mainpage for the user contained a combobox where the user could type in the name of (or select from the list) the company which the user wanted to show information about or register a contact with. He or she could also press a button to move to the search page or enter the administrator view (fig. 5.).
Fig. 4. CMI Corporate database v 2.0 - User main page
If the user pressed the “show” button, the company information page was shown (fig. 6).
Fig. 5 The 2nd version of the database contained 113 fields of information in one table
Fig. 6. Forms that show information about a company
The company information page consisted of multiple pages with all the information that the database had stored about the company. The classification of the company was shown by checkboxes. They showed the correct information, but for most companies just one or two boxes were checked, and the rest of them where empty, which made it difficult to overview the information. This problem will be further dealt with in CMI Corporate Database 2.0 - Problems and solutions. The user could print out the forms, but there were then of course many pages of “no-info”, just showing which classification the company didn’t have. In order to show as few “no-info” pages as possible the database checked if there where any information on a page each time a new company was shown. The code for this was (example):
If Me![Category R&D Biocompany] = True Then If Me![Flik RD Biocompany].Visible = True Then GoTo Rad1 Else Me![Flik RD Biocompany].Visible = True Else If Me![Flik RD Biocompany].Visible = False Then GoTo Rad1 Else Me![Flik RD Biocompany].Visible = False End If Rad1: If Me![Category Venture Capital company] = True Then If Me![Flik Venture Capital].Visible = True Then GoTo Rad2 Else Me![Flik Venture Capital].Visible = True Else
If Me![Flik Venture Capital].Visible = False Then GoTo Rad2 Else Me![Flik Venture Capital].Visible = False End If . . .
What the code does in this example is that it first checks if the field [Category R&D Biocompany) is true (the box is checked). If it is, it then that page is set to be visible. The screen tended to flicker just a little bit when the page property is set to visible. That is why line 2 containes an If statement, that only sets the property to visible if it was invisible before and don’t reset it as visible if it already is visible. The advanced search page form (left) contained empty textboxes where the user could type in search criterias. They could also choose criterias about e.g. specific cities by choosing them from a combobox. The combobox then contained all the names of the cities where a company was registered. This was made to prevent the user from misspelling the name, and also make it easier to find the name the user was looking for. It was even more useful in the combobox where you could search for which companies a specific institution collaborated with. You then just scroll through the list to find the instititution instead of having to make sure that you spelled the name correct. After selecting the criteria the user just pressed the search button and the database returned a list with all the companies fulfilling the criterias. From the search results form the user could then choose to get more information about specific companies (more about the searchfunction in creating a search function). All the pages showing information to the user was locked, so any new inputs or changes of data were made from the administration page. In this page the administrator could change information about a company, write new descriptions, change classifications etc (fig. 7).
Fig. 7 CMI Corporate Database 2.0 - Administration Page
Creating a search function
The search function should be able to perform a search based on information given by the user in a form containing 9 textboxes, 8 comboboxes, and 6 listboxes (see figure 8) .
Fig. 8 The search forms.
The user was first requested to type in information to be used as search criterias or select values in the listboxes. When the user pressed the search button, the program step by step looked through the form and built the filter that was used to select which companies that forfilled the criterias. The code constisted of several parts. First the different variables were dimensioned. strvillkor was a string that contained each specific serach criteria and strfraga was the total collection of conditions that was used as the filter. strvillkor was then given the start value “[Name] > ‘ ‘“ to make it possible for the users to find all companies in the database by leaving the form blank. Searching the textboxes this was done through the following code (example): 25
If Me![qName].Value <> "" Then strvillkor = Me![qName].Value strvillkor = BuildCriteria("[Name]", dbText, strvillkor) strfraga = strvillkor & " And " & strfraga & "" End If
Where the first line checks if the textbox contains any information. Line 2 gives strvillkor the value of the textbox. Line 3 uses the accessfunction “Buildcriteria” that changes the text in the box to a correct criteria format, e.g. if the user types in >B in the Name field, this function changes that into the correct criteria “[Name] > ‘B’”. The fourth line then adds that criteria to strfraga with the boolean term And in between. In our example strfraga now contains the string “[Name] > ‘’ And [Name] > ‘B’”. The search should now reslut in all the companies in the database which name is larger than B. The code then continus the search through the rest of the textboxes in the same way.
If Me![qOrgnr].Value <> "" Then strvillkor = Me![qOrgnr].Value strvillkor = BuildCriteria("[Orgnr]", dbText, strvillkor) strfraga = strfraga & " And " & strvillkor & "" End If
The comboboxes works in the same way, so the line:
strvillkor = Me![qCity].Value
returns the selected value in the [qCity] combobox and this value can then be the base for the buildcriteria function and be added to the searchcritera. The listboxes was a little more difficult. Since I created the listboxes manually giving each line a specific value I had to make the search function go through each line in each list box and check if it was selected and then build a criteria. It looked as follows:
If Me![Listruta Focusarea].Selected(2) = True Then strvillkor = Me![Listruta Focusarea].Selected(2) strvillkor = BuildCriteria("[Cancer]", dbBoolean, strvillkor) strfraga = strfraga & " And " & strvillkor & "" End If
What this piece of code does is that it first checks if line 2 in the listbox [Listruta Focusarea] is selected. If it is, it builds the criteria that the field [Cancer] must be True for all the companies that should be returned by sthe search function. Simple? Yes, but very limiting since any change (misspelling or adding of a new) in e.g. a focus area must be corrected manually in the code. This problem and it’s solution is discussed further in chapter 5 and 9. After looking through all of the list boxes there is only on one line of code left:
DoCmd.OpenForm "Advanced search results", , , strfraga
This command opens the form “Advanced search results” and uses the criterias in strfraga as a filter. The filter selects which companies from the table “CMI företagsdatabas” that should be shown in the results form. The simplicity of the function can be shown by the fact that it really contained just two different programming parts. But the complexity of searching each line of the different listboxes resulted in that the code for the search function contained 573 lines of code. 26
CMI Corporate Database 2.0 - Problems and solutions
The 2nd version of the databse had many constraints and problems. First of all it was very difficult to make any changes in the database. Any minor changes required programming from the system developer. If you for example wanted to add a new focus area you first of all had to add that field into the main table. If you then wanted it to be shown on the screen you had to add the field to each of the forms. In the search form you had to add it to the focus area listbox and then reprogram the search function to make include the field. The same thing had to be done if you wanted to change name of a focusarea or correct a misspelling, or if you wanted to add a new type of cooperation between a company and KI. This was of course very limiting and time consuming. Another big limitation of the database was that you could just store searchable information about one key person, one research contact etc. The page with information about cooperation had checkboxes that were set to true/false if the company had the specified type of cooperation. But you could still just store details about one cooperation, and then you couldn’t specify which type of cooperation that information referred to. This problem was due to the fact that all information about one company was stored in one single record of one table. The recording of contacts with each company was stored in one memo field. If you wanted to type in a new contact you had to go the bottom of the field and there add youre information. This would be very unstructured and it would also be impossible to search or sort the contacts. As I’ve written about before there also was the problem with “no-info”. Since all the pages that is shown on the screen or should be printed had to contain all focusareas, types of cooperations etc with a checkbox after it showing if it was true or false (e.g. if it had that focusarea or not). Since most of the companies just had a few focusareas most the information on the page would show which focusareas the company didn’t have. The information you really want to show to the user are which focus areas etc that the company do have, and this was then easy to miss in the blur of all the other text and checkboxes on the form/print-out. When conducting an academic study using the database you must be able to know when the information about a company was updated. This would be possible in the database, but then you could only store information about the latest update. If that for example would be a limited update of the e-mail adresses of the company you couldn’t know when the financial information was latest updated. When doing a study you also want to be able to put a mark on company so you easily can find it again and include that mark in the search function. If you wanted to do that with the 2nd version of the database you first would have to add that field to the maintable and then add it to the searchform and reprogram the search function. Not very easy, and if the use of the database isn’t easy no one would use it...
The solution - creating a relational database
When analysing the problems I realised that the best thing to do was to redo the structure of the database using the knowledge that I gathered during the development of the first two versions. I learned during the creation of the first versions of the database that it’s not a good structure of a database to have all information in just one table. The structure is simple, but is very inflexible. Instead you should create a relational database.
So what is a relational database, and first of all – what is a relation? A relation in our database is for example that we have a record of a person (an entity) including different attributes (phone number, position, e-mail address etc), and record of a company with other attributes where the person is employed. The connection between the person and the company is then called a relationship. And note that it in this case is only one relationship and not two. Why it looks like two (company has employee, and the employee has a company), is only because you then look on the relationship from two different perspectives (the employer, and the employees). There are many types of relationships, the most common one-to-one, one-to many and many-to-many. One-to-one is for example if a company only can have one contact person, and a contact person only can be that for one company. The one-to-many relationship is very common in our database, and includes relationships like one company can have many employees, but each employee can only be related one company. Finally, a company can have many classifications (a cancer research, immunolgy research company etc) and each classification can be related to many companies (many companies can be do cancer rsearch etc). This is a many-to-many relationship. Of course a relationship can be between more than two entities. A Company can be related to an institution and the institution is related to their employees. The number of entities involved in a relationship is called the degree of the relation. If three entities is involved it’s called a ternary relationship. Each record (entity) must have a unique key. This can be a name, a number etc that makes it possible to distinguish it from other records. If a records holds many unique values, the database creator choose which field that contains the identification value. This field is called the primary key. It’s possible to use many fields (a composite key) to create the unique identification. The relational data model contains it owns set of terms. A row (record) is called a tuple, a field (column header) is called a an attribute and the whole table is called a relation. The data type defining allowed types and values is in each field is called a domain. Edgar F. Codd introduced this model in 197014.
Relations in CMI Corporate Database 3.0
In the 3rd version of CMI Corporate Database the original table was split up in to 17 tables (fig. 9). The main table was called tblcompany and contained the “main” company name field and general information about each company. The information about cooperations between companies and Karolinska Institutet was divided into two tables. One with information about the cooperations (tblcooperation) and one with the different cooperation types (tblcooperationtype). tblcooperation was then linked further to tblkicontactperson where information about the contacperson on KI was stored. This is a classical use of relational databases. When you register a contact person on KI for a cooperation you just store the name of the person. The contact information about that person is stored in another table. So for every cooperation that involves that person you just have to type in the name and not the contact information. And if that person e.g. changes phonenumber you just have to change that in one place: tblkicontactperson, and not in every record containg a cooperation. 28
Fig. 9 Relations in CMI Corporate Database 3.0. The relations are shown by lines. 1---1 represents one-to-one relations and 1----? represents one-to-many relations.
The problem with storing information about many employees at the companies was also solved by this model. The table tblcompanyemployee contained records with information about the employees with a field for the company name related to tblcompany. Now you store information about as many employees as you want. Same is due for the contacts. Each contact is stored in one record in the tblcontact table. The company name is related to companyname in tblcompany and the contactperson at the company is related to tblcompanyemployee. A new table tblupdate was created to store information about updates. Here an unlimited number of update information records could be stored. The updatetype was related to tblupdate where you could store a new update type anad use as a mark and include in your search.
A new search function
In the new relational model of the database the different classificiations were stored in a table. This table was then the base for the listbox containing classifications in the Search form. Now you could just add a new classification type in tblclassification and the search form would automatically be updated with that classification. The same thing was due for any changes, like if you wanted to correct the misspelling of “intensiv care” to “intensive care” you just corrected the spelling in tblclassification and since that field is related to the classification field in tblcompanyclassification the spelling was corrected for all companies having that classification. 29
The relational model of the database demands a new type of search function. Since the information is stored in many different tabled you can’t just add a filter to the form. Instead the search is performed through queries. The listboxes As in the previous version of the database the search function first created a search criteria based on what information the user selected in the search form. The new search function searched through the listboxes by following code (example for the city list box):
first = True Set ctl = [listrutaCity] For Each itm In ctl.ItemsSelected strvillkor = BuildCriteria("[City]", DB_TEXT, ctl.ItemData(itm)) If first = True Then strfraga = strfraga & " and ((" & strvillkor & ")" first = False Else strfraga = strfraga & " or (" & strvillkor & ")" End If Next itm If first = False Then strfraga = strfraga & ")"
This code uses the function For…Each..In .ItemsSelected to find which items in the list that are selected. The criteria is then built on what information is being stored in that item by using the .ItemData() function. And since the itemdata for this example is based on the cities stored in tblcompany you can just change the cityname or add a new city in that table and the searchfunction can automatically use it as a criteria without any new programming being done. Searching through queries As I stated before the new searchfunction used queries to search for companies. This is being done by the following code:
Set db = CurrentDb() Set query = db.querydefs("querycompanyinfo") query.sql = "Select [Companyname] From [tblcompany] Where " & strfraga & ";" query.Close
This sets the criterias for the query querycompanyinfo, which searches through the first table tblcompany. The query is of the type Select…From…Where which gives as it’s result the companynames from tblcompany where the companies fulfill the criterias being given by strfraga. More queries Now the searchfunction continues and gives results from searchcriterias in other tables. For the classification the following code is used:
querycriteria = "" Set ctl = [Listrutaclassification] If ctl.ItemsSelected.Count > 0 Then For Each itm In ctl.ItemsSelected If Len(querycriteria) = 0 Then querycriteria = Chr(34) & ctl.ItemData(itm) & Chr(34) Else querycriteria = querycriteria & "," & Chr(34) & ctl.ItemData(itm) & Chr(34)
End If Next itm Set db = CurrentDb() Set query = db.querydefs("queryclassification") query.sql = "Select [Companyname] From [tblcompanyclassification] Where [Classification]In (" & querycriteria & ");" query.Close Else Set db = CurrentDb() Set query = db.querydefs("queryclassification") query.sql = "Select [Companyname] From [tblcompany]" query.Close End If
This is search creates a new query: queryclassification using the Select...From...Where...In function. This is used because the classification of companies is made in a table where each record contains one classification of a company. So if for example Amgen is classified as both a cancer and genetics company tblclassificationcompany would look like this: Company Classification Amgen Cancer Amgen Genetics If you now would use the code Select [Companyname] From [tblcompanyclassification] Where [Classification] = “Cancer” And [Classification] = “Genetics” you wouldn’t get any results since no record contains both classifications. But the function Select...from...Where...In can perform the wanted task. The query results in a list of names of companies that fulfill the classification criterias. The search now continues on and creates querys for each table. All of these queries is then being used as the base for a last query, querySearchResults (fig. 10 and 11) which finds which companies are being present in all query results. This query is than the base for the search result form which presents the results to the user.
Fig. 10 QuerySearchResults; The final query which select which companies are present in all subqueries
Fig. 11 SQL code for the query QuerySearchResults
Another advantage of using queries is that the resulting query now can be used as the base for other forms and reports. In the 2nd version of the database the search was performed through filters which are difficult to use on multiple forms and reports, but the resulting query can easily be used for this.
Making analysis – exchanging information with Excel
The search results page contained a button where the search results could be transferred to excel for further analysis. The code for the transfer is:
Dim oApp As Object DoCmd.TransferSpreadsheet acExport, acSpreadsheetTypeExcel2000, "queryexportdata", "databasexport", Hasfieldnames = True, , UseOA Set oApp = CreateObject("Excel.Application")
This code transfers the data from the query queryexportdata to file”databasexport” which can be opened and edited in Excel. This query contains information about the companies that from the query querySearchResults (Fig. 12).
Fig. 12 Queryexportdata, The query with information being exported to excel
The Useful Tool – CMI Corporate Database 3.0
In order to make a more user friendly interface I wanted to change the access buttons (which are built in the access software) into clickable text in a “weblooking” manner. To make it more clear what you can click I also wanted the text to be underlined when the mousepointer is beeing moved over the text. When you move the mouse over a spcific textbox the following code is executed (example):
Private Sub Text32_MouseMove() If Me.Text32.FontUnderline = False Then Me.Text32.FontUnderline = True End Sub
This routine changes the textstyle to underlined. When the mousepointer leaves the textbox and enters the background screen it changes back through the following code: Private Sub Detalj_MouseMove()
If Me.Text30.FontUnderline If Me.Text32.FontUnderline If Me.Text35.FontUnderline If Me.Text36.FontUnderline End Sub = = = = True True True True Then Then Then Then Me.Text30.FontUnderline Me.Text32.FontUnderline Me.Text35.FontUnderline Me.Text36.FontUnderline = = = = False False False False
This changes all the textbox back to not underlined style. If the lines didn’t contain the If statements it would continue to set the font style to not underlined which would slow down the application. Now it only executes the command if the textboxes really are underlined. To further make the user main page more easy to follow I made the “Show info about…” textbox automatically change into e.g. “Show info about /Companyname/” (fig. 13). This was done by setting the controlsource for the textbox to: ="Show info about " & [komb companyname]
Fig. 13 The user user main page in CMI Corporate Database 3.0
To make it more easy to navigate in the database the textboxes that performed actions was put on a red and blue bars. The number of pages showing information about companies was reduced in the 3.0 version compared to the 2.0 version of the database since it didn’t contain as many “no-info” boxes. The first page (fig. 14) showed general information about the company, like adress, classification and description. Page 2 (fig. 15) showed financial information and page 3 (fig. 16) showed information about cooperations with Karolinska Insitutet. Information from other tables than tblcompany were shown as subform which were linked to the main form through the company name. In for example the classification listbox all classifications that a company had was shown.
Fig. 14 General information page
Fig. 15 Cooperation information page
Fig. 16 Financial information page
If the user wanted to print out information they could just press the print info button and a preview of the report was shown which then could be printed out (Fig 17).
Fig. 17 The print preview page
Contacts with companies were registered in a specific form (Fig. 18). Here the user easily could choose which contact person at a company that they had spoken to, their on name and type in a short note about what the meeting/phonecall etc. The date for the contact was also registered so the user could easily scroll through the old contacts in a chronological order.
Fig. 18 The register contact form
The search forms was now divided into three pages (see Fig. 19-21).
Fig. 19 General search page
Fig. 20 Cooperation search page
Fig. 21 Classification search page
The user could type in search criterias in the textboxes or select values from the listboxes. By double clicking on a listboxes all values in it were deselected and by pressing the “Clear listboxes”-button you could deselect all values. When pressing the “Perform Search” button the search was performed and the result was presented on the Search Results form (Fig. 22). Here you could choose to export information to Excel, show and print detailed information about the companies, print labels (Fig. 23), print a list of the companies or get short a summary of the financial information about the companies.
Fig. 22 The search results form
Fig. 23 Preview of labels
Analysis of two examples
Companies owned by Karolinska Instutet
Karolinska Institutet owns companies through KI Holding AB. Today KI Holding owns stocks or stock options in 28 companies. The companies are:
Absorber AB Accuro Immunology AB Actar AB Aerocrine AB Affibody Technology Sweden AB Appetite Control AB BioStratum Inc Creative Peptides Sweden AB Diamyd Medical AB Eurocine AB Global Genomics AB HUBIN AB Independent Pharmaceutica AB InDex Pharmaceuticals AB KaroBio AB Karolinska Education AB Karolinska Innovations AB Karolinska Investment Management Holding AB Karolinska Research Services AB Lipocore AB Neuromics Inc. Oncopeptides AB Otre AB RaySearch Laboratories SIDEC Technology AB Triple Crown AB Virtual Genetics Visual Bioinformatics AB
They are mostly very young companies, 21 of them are started between 1997 and 2001 (fig. 24).
0 0 1979 1980
0 1986 1987
0 1993 1994 1995 1996 1997 1998 1999 2000
Fig. 24. Starting year for companies owned by KI Holding AB (except Biostratum and Neuromics).
The companies are working in many different therapeutic areas. The most comman was genetics (7 companies), immunology/infection (6), endocrinology/metabolism (4) and cancer (4) (fig. 25).
8 7 7 6 6
5 4 4 3 3 2 2 1 1 1 1 1 1 1 1 2 2 4
Ey e Ga str oin tes tin al Lu ng /R es pir atio n Nu trit ion CN S/ Pa in Ca nc er Dr ug De live ry Tra Bio ns pla inf nta orm tion ati cs / IT -pr od uc ts Ca rdio va scu lar Kid ne y/ Ur olo gy Ge ne tics
he alin g
M eta bo lism
W ou nd
Fig. 25. Focus areas for companies owned by KI Holding AB
The main product that are going to be sold by the companies are pharmaceuticals which seven companies work with (fig. 26). Five companies sells consulting services and four biomedical knowledge. Four companies sells IT-products. Geographically the there is a cluster of companies around KI Holding AB at Fogdevreten (Karolinska Institutet, Campus Solna). Nine of the companies are situated there. Karolinska Holding AB invests mainly in companies located in Stockholm (25 of 28 companies, Fig. 27). This is mainly due to that they tend to invest in ideas from researchers at KI and these companies want to stay close to the research at Karolinska. Only two companies are non Swedish (Biostratum and Neuromics in the US), but they are both started by researchers from Karolinska Institutet.
En do cri no log y/
Im m un olo gy /
Sk in /
Inf ec tion
8 7 7
6 5 5 4 4 4
3 2 2 1 1 1 1 1 1 1 1 1 2
Ch em ica Co ls ntr ac tR es ea rch Fin an cie r Fu nc tio na lF oo M ds ed . te ch .E qu ipm M en ed t .T ec h. Te ch niq ue s Dia gn os tics La bo rat ory Pr od Bio uc t me dic al Kn ow led ge Ph arm ac eu tica ls
Minneapolis Durham Uppsala Stockholm Djursholm Huddinge Solna Sollentuna Djursholm 1 Huddinge 1 Solna 1 Sollentuna 1
Fig. 26. Product areas for companies owned by KI Holding AB
Uppsala 1 Durham 1 Minneapolis 1 Stockholm 25
Fig. 27. Geographic location of companies owned by KI Holding AB
Their economic development and impact on the society is not within the scope of this thesis, but will be interesting to study in the future. Especially in a couple of years when they have had he time to grow. Probably there will be some failures but maybe also a few that has become really strong companies that have great impact on the society in Sweden and around the world. 40
Co ns ult ing
Ed uc atio n
ITpro du ct
ITse rvic e
R&D Biomedical companies in Sweden
Based on the information in the database you could show some interesting statistics regarding the biomedical industry in Sweden. Since the database doesn’t have integrity regarding all biomedical companies in Sweden the following example data does not give a complete and valid picture of the situation in Sweden. Instead it gives you information about the companies that are stored in the database. The reason for executing this example is to show the kind analysis that you could make with the database if you perform a study where you store data about all (or a statistical selection of) the Swedish biomedical companies.
Biotech focus areas in Sweden in Sweden
Anestheiology/Intensive Care Bioinformatics Cancer
Cardiovascular Cell and Molecular Biology CNS/Pain Ear/Nose/Throat Endocrinology/Metabolism
Eye Gastrointestinal Genetics Geriatrics Gynecology/Obstetrics Immunology/Infection Kidney/Urology Lung/Respiration
CNS/Pain Immunology/ Infection
Nutrition Odontology Pathology Pediatrics Skin/Woundhealing Surgery/Orthopedics Transplantation Other
Focus areas Of the 182 R&D biotechnology companies in the database 44 were registered as working with immunology/infection, making that focus area the most frequent in the database. Second most frequent was cancer (22 companies) and then followed bioinformatics (17), CNS/Pain (17), Cardiovascular (16) and Genetics (16) (Fig. 28). When looking at the focusareas for the most recently formed companies (1996-2000) the most companies still work with immunology/infection (10). Now genetics had moved up to second (6 companies) and after that bioinformatics (4), CNS/Pain (4) (Fig. 29).
Total: 182 companies Genetics
Fig. 28. Focus areas for R&D biotech companies in Sweden.
Focusareas for new biocompanies biocompanies
12 Number of new companies 10 8 6 4 2 0
Geographic development The number of companies that works with R&D in the healthcare sector (in the database) grew from 18 in 1975 to 180 in 1999. The expansion has been stable and rapid during last 15 years. This is though just a count of the companies and Fig. 29. Focus areas for R&D biotech companies formed in doesn’t count the growth of the Sweden between 1996 and 2000 . companies. But at least it shows that the R&D healthcare sector is growing very fast in Sweden (Fig 30). The expansion has mainly been in Stockholm. Here number of companies grew from 51 in 1995 to 70 in 2000 compared to 95 in 1995 to 112 in 2000 in the rest of country (Fig. 31). Of the 36 companies created during the last five years 22 has been formed in the StandUp Region (Stockholm-Uppsala) (fig. 32).
Ce la nd M ole Ca cu nc lar er Bi olo gy Ea r/N os e/T hro at Bio inf orm ati cs CN S/ Pa in En Ca do rdi cri ov no as log cu y/M lar eta bo lism
Im Ge m un ne olo tics gy /In fec tio n
Ga str oin tes tin Lu al ng /R es pir ati on
The figures in this example has shown that there is a rapid development of R&D healthcare companies in Sweden, mainly in tha StandUp region. The development is mainly taking place with in the immunology and genetics fields. Still, as was stated in the beginning of this example, these calculations are R&D Life -science companies Life based on the companies being stored at 1975-2000 present in the databased and are not the complete picture of the development in Sweden. It remains for further studies to make sure that the database have integrity for these companies. It will be interesting to see if it will give the same result as this preliminary study.
180 160 140 120 100 80 60 40 20 0 Before 1975 1975-1979 1980-1984 1985-1989 1990-1994 1995-1999
Total number of biocompanies
Fig. 30. Total number of companies working with Healthcare R&D in Sweden.
Geographic development 1996-2000 development 1996
80 70 60 50 40 30 20 10 0 Before 1996 1996 1997 1998 1999 2000 Stockholm Lund Göteborg Uppsala Umeå
Geographic distribution of new biocompanies
Göteborg 15% Uppsala 18% StandUp Region 61% Lund 20%
Umeå 5% Umeå Lund Göteborg Uppsala Stockholm
StandUp Region Göteborg Lund
Fig. 31. Geographic development of newly formed R&D healthcare companies in Sweden between 196 and 2000.
Fig. 32. Geographic distribution of R&D Healthcare companies in Sweden formed between 1996 and 2001
There are many things to be learnt from the creation of CMI Corporate Database. Firstly the importance of detailed planning before executing a database project. The work should start with an analysis of which information that you want have in the database. This can be done by doing a prestudy where you collect information into a paper based system and examining if the information should be sufficient for your purposes. It is crucial to identify who has need of the information (the “customer” of the database) , and discuss the design of the database with them so that you do not forget include information that is important. After having evaluated the information to be stored the work with structuring the database begins. In the work with CMI Corporate Database I started with an existing database and further developed that into the final 3rd version. This is not an optimal way of creating a database since it is very difficult to make changes in an existing database. Instead you should start from the beginning and thoroughly think through the structure of the database. First you should consider how the information is supposed to be stored. In a database it is an important rule of thumb that information only shall be stored once. In order to do that most database needs to of relational model (see Relational Databases). Here information is stored in many tables and information that is being used in many places in the database is just stored once and then related to the other tables. The relational model is complex to set up but makes the administration and future changes of the database simpler. When thinking through which information that is supposed to be stored you should consider in which tables they are supposed to be stored. This is easy to forget and you might then store the information in the wrong table. A typical example from CMI Corporate Database is the decision to store the information about turn over and number of employees in the main table (tblcompany). The fields contained information from 1996 and 1999. If you now want to store information from 2000 you have to create a new field in the table and add that field to all the forms and the search function. What should have been done instead was to create a separate table for this information. The table should contain a field for identification of which company that the information is about, a field for which date the information is about, one for which category of information that is stored (turn over or employees etc) and a numerical field where you can a value (e.g. number of employees). If we would have done that you could store information about any year of choice and it would also be possible to store information about companies with other financial years. Another improvement would also be that if you wanted to store other financial information (e.g. cash flow), you could just type in that category and it would be stored in the database. The searchfunction could then automatically look through the table to find out which categories of financial information that exist and use that as a base for the search form and function. The same thing applies to all information that you at first think that you just want to store once. You should be sure that you really just want this information once for each main record (company) before adding that to the table. Example of this is the storing of annual reports in CMI Corporate Database. There you could store one annual report per company. Probably you would like to store more than one annual report but that is impossible in the 3rd version of the database. In the 4th version it will be possible to store more annual reports because the annual reports will be moved to a separate table where each record contains a company identification, which year the report is from and the report. Because this table is related to the main table through the company identification you could add a combox in the userform where the user can select from which year you want to see the annual report and the database would show the report. It will take a lot of time to add this function to the database compared to how easy it would have been to add this function from the start. 43
Planning and analyses is important but another lesson from the building the database is that you do not know which information you need until you’ve started working with the database. Also the first users viewpoints will give you ideas that you never could have known from the beginning. Because of this it is extremely important to make that database structure as open and flexible as possible. An example of this is the classification of the companies in CMI Corporate Database. Before creating the 2nd version of the database the different possible classifications were studied. These were then used in the 2nd version. But since they were stored in the main table they were locked to those that were chosen from the beginning. Even though we thought that we had a good classification we realised when typing in the information that some areas were omitted. Also in a database as ours were we do not know how the industry will look in a couple of years (there probably will be companies working within areas that do not even exist today) it is not good to make this type of locked classifications. In the 3rd version of the database we changed this and put the classification in a separate table so that you could add new classifications as time goes on. In an analytical tool must be able to guarantee the integrity of the data (see Database integrity). The users of the data should then be able to use this a search criteria in the database. The work with giving CMI Corporate Database integrity is difficult. Many of the companies are small and new companies are started and other goes bankrupt. Today (2001-05-15) the database contains 194 R&D biomedical companies. This is not a complete picture of the number of companies in Sweden but it can be compared to Ernst&Youngs estimation of 140 companies4 in 1999. When comparing it with the 239 in R&R Medical Research Database and about 500 (Nordic compnies) in the Bioprint online catalogue that states that CMI Corporate Database is not complete regarding all R&D biomedical companies in Sweden. To make the database complete further investigation have to be done. The database is complete regarding all companies owned by KI Holding AB. Here it will be interesting to get information from Karolinska Innovations AB who also are joint owners in some companies to get the complete picture of which companies Karolinska Insitutet owns. The CMI Corporate database isn’t valid regarding most of the companies in the database since there haven’t time to contact all companies to make them varify their information. The database contains functions for creating e-mail lists or adress labels to simplify that work. This will be done during the summer 2001. When the companies return their information they get the update classification as being valid at the date of sending their information. You can then use this as a criteria in the searchfunction and only include information about companies that are valid after a certain date. It is the possible to perform limited studies where you can guarantee the integrity by contacting all companies in a selected area If you to able to draw further conclusions from the studies it is, as always, important to do a random statistical selection of which companies to include in the study. The database then gives you the possibility to mark these companies and use that as a criterion in your searches. Probably most of the information about the companies will already be in the database and be valid, but you can contact the companies which aren’t and validate the information (or add a new company) yourself. This way you can use information collected by other users and by validating some companies on your own study you will do work that other researchers can make use of. So by using CMI Corporate Database you can save time for yourself and help other researchers which was one of the reasons for creating the database in the first place. In a database the information should be stored in as small units as possible. This was unfortunately not always done in CMI Corporate Database. For example the full name of contactpersons were stored in one field. That results in that you can only search for a person by their first name. If the 44
name instead was stored in two field (first- /surname) you could choose to sort the names by either given name or surname. The same thing applies for all information in a database: All data should be stored in the smallest uniqe unit. Before building a database you should also make a detailed plan of how the information is supposed to be presented to the users. It is important make a chart over the information flow in the database. You should also carefully plan the layout of the form so it becoms easy to navigate and understand how to use the database, and perform tests as the database is being created to see if the users can accomplish the different tasks that they want to perform. The presentation of statistics in form of graphs or tables is in the 3rd version created in Excel. This gives the possibility do make advanced illustrations of the data. A problem with this is that it requires som work by the user. Probably many users just want a simple graph displaying the results. In Access you have the possiblity do automatically create graphs. This is something that should be a part of the 4th version of CMI Corporate Database. Since the CMI Corporate Database 2.0 lacked many functions that we wanted for the database not many user test were done. Before upgrading the 3rd version of the database series of user evaluations must be performed. The graphical user interface is supposed to facilitate the use of the database but we don’t know today if novice user are able to use the database directly or if they need to be trained in using it. A possible solution is to create a help function and a tutorial. This would to take 2-3 weeks to create. The user tests will be performed during the summer of 2001 and will be evaluted before the creation of CMI Corporate Database 4.0. In the 4th version the database probably will be published on the Internet. Before doing that a lot of questions about safety on the internet has to be solved, for example: 1. How do you publish a corporate database on the internet? 2. What information do we want to publish to the public? 3. How do we create security levels so that people who should have total access to the database can see all information but the normal user can’t? 4. How do we make sure that the database doesn’t get “stolen” (copied and used by non authorised users)? 5. PUL sets strict conditions about which information that is allowed to leave out to a 3 rd country. Probably all persal particulars have to removed from the internet version of the database. 6. How do we keep the internet version updated?
CMI Corporate Database 3.0 is a useful tool for reasearch and matchmaking tasks. But still much work remains to give the database better integrity and to make it easier to use.
Backlund, Anna et. al. 2000. Det svenska biotekniska innovationssystemet. Arbetsrapport maj 2000. Stockholm:NUTEK
Utbildningsdepartementet. 2000. Att spränga gränser – Bioteknikens möjligheter och gränser. SOU 2000:103.
Sveriges Tekniska Attachéer. 2000. Innovation och tillväxt i bioteknik – en global studie. Stockholm:STATT
Ernst&Young. 2000. European Life Science Report 2000, Evolution.
Kneller, Robert. 2001. Technology Transfer: A review for Biomedical Researchers. Clinical Cancer Research. 7(4), 761-774
Wilkinson, John. 2001. The role of technology transfer in biotechnology. Journal of Commercial Biotechnology 7(3), 218-223
Henrekson, Magnus and Rosenberg, Nathan. 2000. Incentives for Academic Entrepreneurship and Economic Performance: Sweden and the United States. in G. Törnqvist and S. Sörlin (eds.) The wealth of knowledge – universities in the new economy. Forthcoming at SNS Publishers, Stockholm.
Hall, Zach W. and Scott Christopher. 2001. University-industry partnership. Science 291(5504), 553
Amihai, Motro. 1989. Integrity = Validity + Completeness. ACM Transactions on Database Systems 4(14), 480-502
Klander, Lars and Mercer, Dave. 1999. Access 2000, developer’s Black Book. Coriolis Technology Press pp14-15
Sveriges Riksdag. Personuppgiftslagen. SFS 1998:204 1§ Sveriges Riksdag. Personuppgiftslagen. SFS 1998:204 9§ Laage-Hellman, Jens. 1998, Den biomedicinska industrin i Sverige. B 1998:8:22. Stockholm:NUTEK
Codd, Edgar F. (1970), A Relational Model of Data for Large Shared Data Banks. Communications of the ACM. 6(13), 377-387