This action might not be possible to undo. Are you sure you want to continue?
yesterday. (1) What database approach(es)? (2) What tools? (3) How to collaborate on it (them)? Tishkoff: Addressing one question that has been raised, this is not a PrIMefocused discussion unless you think it should be. Many activities are underway already. Also, questions of money and international informationtransfer restrictions can be taken off the table from the present discussion. MACCCR will deal with that separately. Preservation of what has been done and used Bulzan: Screening of data and methods has to be still a central concern. Frenklach: A central and motivating concern of PrIMe. Want to accept all data initially – no censorship, just review for format. Subsequently, analysis in rigorous way to generate opinion of what is best; still with latitude to disagree. Ruscic: Two steps: Moving data from the literature in, then getting new info in (required by journals etc.?). Then evaluation should be done, in contrast with simple data collections. “Approaches” and data formats Pope: Let’s make sure to discuss “approaches” the plural form. Ruscic: For example, in addition to userinputted data, include data “hooks” (automatic collection). Golden: That is a separate matter. I want a core approach, though, and often a single result. For example, I don’t want multiple values of heats of formation. Pope: True, but recognize that for different experiments, models, properties, and data, very different formats of information are needed. Lindstedt: Also need to allow for different forms of engagement about the information. Yet, we benefit from having standards for data format. Brezinsky: From my point of view as a kineticist, I have an available format that works, and that is PrIMe. I intend to go ahead and use it. Then let’s see. Frenklach: Does a multiplicity of databases aid or hold back the field? Of course,
anyone who wishes to can create a format and a governance format; autonomous databases are okay. However, can we make sure they will map to each other? To do so, we need to make sure we have some standards for data format. PrIMe’s focus has been creating a data mode, a formatl. Ruscic: Computer and information scientists have been hard at work looking at automatic data capture and categorization. Our scientific challenge is making sure the standard used for data formats is sufficiently flexible. I agree different databases / approaches are fine. Golden: The number of ways of doing things shouldn’t be limited, but the data are the data. Perhaps there should be a single Active Thermochemical Tables database (Ruscic: or not?). Still, we need a standard format and a group to manage it. Google is a way to find information, but it isn’t the model for us to follow. Needs of the community Ruscic: Consider two type of users, one who just wants a number and one who wants to know the choices and the decision process. Brown: Likewise, the evaluation process has to be very transparent. It leads directly to the question of governance. Tsang: As we move toward data for real fuels, data are much sparser. Something cleanly defined like GRImech is too specifically defined for that fuzzier info. Frenklach: True, the number of reactions will go through the roof; but then how to compare among them? Tsang: Longterm, maybe we can calculate everything, but what about the intermediate term. Golden: The solution is to capture all the rate constants and find a way to automatically generate a “best” reaction set. Lindstedt: The way we do it now is to review all the choices and try to forge/evolve a reaction set. We have to adapt this approach, not discard it. Golden: The community should decide what is the best rate constant. Lindstedt: A strength of the TNF workshops is to allow individual development but also to have focused dialogue every two years. How can we exploit advances in computing power? (Richards) Frenklach; Simply increasing the size of the reaction set we can build is not adequate; it is not convergent due to increasing degrees of freedom and amounts of uncertainty. Richards: Building on past simulations, past databases, and past reaction sets is a logical model of advancing. GSmith: Remember storage, such archiving would require storage of all that. Knyazev: Generating a situationspecific mechanism [as for reduced mechanisms
for CFD domain etc.] is fine, too, but it is important to make sure the base data are there. How does this situation compare to past organizational experience like NASA atmospheric chemistry? (Law) Golden: That activity is a success, but it’s a thirtyyearold “19thcentury success,” very much personal and based on judgment. The concession is that we used to hand each other papers, but now we bring our laptops. The reason I’m involved in PrIMe is because it is the way I wish we could do the other. Experts recommend the best and current value, but the capability of disagreeing is important, and cyberinfrastructure can aid both objectives. GSmith: A key question here is whether you want formal evaluation panel(s) to be set up. Task seems too vast. Cyberinfrastructure allows more people to be involved in the process. Knyazev: In atmospheric chemistry, it is expected that you will start from the NASA set and then examine alternatives relative to it. GSmith: Again, back to whether you want a formal panel (or panels) or a group of volunteers like in PrIMe. Pope: “Communities” is more appropriate than “community.” Consider three: Chem kinetics community – Has talked about it a long time TNF has worked out a simple CI approach, successfully & yet without specific $ DNS/ Turbulent combustion: Has very different issues. Where to go? Tishkoff: Combustion can have a virtual community of virtual communities. Break out into two groups, one on chemistry and one on reactiveflow / turbulence modeling.
Breakout session on reactiveflow / turbulence modeling What technical issues are there? The massive size of DNS databases is a hindrance to sharing the solution. 1) Storage of data 2) Access to the data 3) Postprocessing the data 4) Visualize the data
Because the storage is so large, generally data sets are not moved around from one site to another. Also, transferring large files across existing networks can be very time consuming. Generally the data sets also require a large number of processors (hundreds) to post process and work with the data. If a researcher does not have his or her own cluster, his options are limited. Software to postprocess the data may be tied to the hardware on which the solution(s) was generated. Collateral issues 1) Researchers oftentimes have to travel to the site to work with the data. 2) There may be a learning curve to work with the software and the setup of the computing center. 3) Researchers may have to write their own software to postprocess the data for their specific research needs. (This is also true in general.) 4) Researchers may have to apply for accounts on the site computer system and apply for computer time. This may take (many) months. 5) Cost to travel and remain on site for extended periods 6) A researcher may need to return to the site at a later date as new ideas of research come to mind. A partial solution might be to use subsets of the data, perhaps a hierarchical structure of sets and subsets of the data. For example, for an axisymmetric geometry, a researcher may need only mean properties along radial lines at several axial locations. A larger subset may be unsteady data along selected lines (e.g. axis). Which set or subset of data that a researcher uses is left up to each individual researcher based upon his/her needs and local resources. Subsets of the data would allow researchers to access data quickly without many of the restrictions listed above.
Unit problems / reference experiments There was additional discussion about including selected unit problems in a database, by analogy to Flame D in the TNF Workshops. This would include: 1) Clear definition of the geometry 2) Computational grids 3) Specification of all boundary and flow conditions 4) Experimental data (or links to the data) 5) Solutions To design a suitable unit problem, you need to get a small group together of computationalists, diagnosticians, and experimentalists. They would have to deal with additional issues including: • • • Database should include metadata (additional information about the data) with the actual data What sorts of data and information are needed by computationalists? What data can be measured?
Summary of Chemistry Breakout Session Jeff Manion Overview There was an underlying consensus by all involved regarding the need for some sort of cyberinfrastructure (CI) to further the development of chemical models of combustion. Participants were generally supportive of utilizing PrIMe or a PrIMelike structure to further this goal. There was less clarity on how the existing structure should evolve or how to best meet our common goals. There was discussion of both development of the “vision” and the need to attack practical problems. There remains a significant amount of uncertainty on what the CI should ultimately look like. Nonetheless there was substantial agreement that the immediate steps have to be to: 1) get the data in order and 2) apply the infrastructure to practical problems as a method of determining what works and what doesn’t. A kinetics database as an objective Ed Law: It is time to nail down the reaction rates for combusting systems. This needs to
include a comprehensive measurement program as well as Cyberinfrastructure (CI). CI alone cannot solve all the problems and we should not claim it will. We need a concerted national effort to meet all facets of the problem. Dave Golden: It also needs to be recognized that you cannot “nail down” the reaction rates once and for all and then move on. Our knowledge of reaction rates will continue to evolve and this needs to be reflected in the database of rate data. It is the function of the evaluated rate “Library” to reflect our evolved knowledge and the job of the working groups to carry out the necessary evaluations. Greg Smith: Yes. Because of this, the question of how to structure the evaluations and maintain the momentum of the working groups is a key issue that has not been fully resolved. Michael Frenklach: A point of having the working groups and a developed cyberinfrastructure is to prevent unnecessary duplication of effort. Dave Golden: Potential models of the evaluation effort are those of GRIMech, the Baulch evaluations [IUPAC Subcommittee for Gas Kinetic Data Evaluation http://www.iupac kinetic.ch.cam.ac.uk/index.html], and the NASA Data Panel [http://jpldataeval.jpl.nasa.gov/]. The latter two are still active and the work is being done by experts with minimal funding and mostly donated time. Branko Ruscic: One has to be careful, there are differences in the number of reactions, etc. You also need to look at how slow those processes are. There is a need for more efficient, more automated methods. Wing Tsang: One also has to be cautious when making direct comparisons with the methodologies and experience with models of small molecule fuels, as with GRIMech, because the database is much, much smaller for real fuels. Unknown: It will be important to define the process and avoid repeating past mistakes. Wing Tsang: Traceability of information is crucial. It needs to be “on paper”. CI role in creating mechanisms Vadim Knyazev: The need is really to have a set of accepted values so that everyone is starting from the same place. The Combustion Institute could think about implementing policy that would require e.g. everyone building mechanisms to begin with an accepted set of values or at least define where the numbers come from. Hai Wang: Many of the questions we have discussed so far are too global in my opinion. From a personal historical perspective I began by building mechanisms from scratch after extensive searching of the literature. I don’t have the time to do that any more. I can teach a student, but it will take the student maybe 3 years to learn all the tricks and get good at it. And of course the student will leave soon thereafter. And now we are talking about combining models and building mechanisms for much larger fuel systems. This is even
more difficult. I think we need to approach this from a practical perspective. Don’t ask if the vision is this or that, but simply how to build the mechanisms we need. Wing Tsang: I agree with Hai. This is not a question of developing grand schemes. We need shortterm deliverables that noticeably advance the field. The calculations of Don Burgess, for example, will provide a selfconsistent set of thermodynamic data that will cover most hydrocarbon combustion systems. Actively working to meet such defined targets will bring clarity to what is truly needed. Strategies and tactics for using CI Ed Law: Practical issues are important, but we should not ignore the longterm vision. Now is the time to think about longterm strategies if we want longterm success. Greg Smith: Whatever its precise nature, it is important that the cyberinfrastructure we develop be flexible and extensible. Ken Brezinsky: I’m not sure what I ultimately might want. To start with I simply want fast access to data and fast dissemination of new results. Dave Golden: It is important to differentiate between individual and group research. We need a 21st century approach. Ken Brezinsky: Yes, but that is the next step, right now I just want fast access to data. Nancy Brown: A key aspect of cyberinfrastructure is automation of bookkeeping chores so that one does not waste intellectual horsepower on the mundane. Michael Frenklach: What we need right now is a base level of agreement. We need a common way of storing data. Organizing the data is the first step. The term is a common data model. After that different people can use the data however they want, e.g. develop different optimization methods. Jeff Manion: It is important not to confuse the issues of data collection, storage, and viewing with those of data processing and data evaluation. These can be largely separated as long as the data can be easily imported and exported in defined and wellconsidered formats. It is important that the information is available in an easily accessible and flexible manner to promote its use throughout the entire community. Hai Wang: As I see it, a key strength of any cyberinfrastructure such as the PrIMe effort is to provide a structure to discuss data. It provides a forum for the group to decide what the best data are “somehow”. The “somehow” can be an evolving process. Ken Brezinsky: … a sort of chemical kinetics chat room. Michael Frenklach: It can be more than that. One can use the same machinery to present new ideas to the community. Issues of archival credit
Ken Brezinsky: That raises some questions. The old scientific model was evolution of ideas through a cycle of articles. At least with that methodology I get credit for publications even if the idea is ultimately wrong. It is not obvious how that applies with publication of ideas through the cyberinfrastructure. Dave Golden: This is a serious question. In the past, certainly, publications were a key determinant of getting raises, promotions, tenure, etc. and those are important incentives and necessary to the continued health of the community. Bob Santoro: People can still get credit based on the community opinion. That may be the face of the future. Perhaps the future will not be based on the current model. Data evaluation issues Unknown: An important function of the cyberinfrastructure is to prevent people from using bad data – to tell people “you can’t use this”. Vadim Knyazev: I don’t see it quite that way. It is important to have a standard model but one can deviate from that standard if needed. Michael Frenklach: In fact that is precisely the function currently filled by current “standard” models such as GRIMech. Everyone starts from GRIMech because it is accepted by the community. The problem is there is no process for updating standard models and most models have a useful lifespan of ten years or less. Unknown: In some ways what we are talking about is a Wikipedia for combustion. It would contain generally accepted information. If it is not correct or you disagree you can change it, or at least there would be a mechanism or process for suggesting changes to the accepted knowledge base. What data to store – and where? Unknown: An issue that will have to be better explored is exactly which experimental information should be stored for the future. How much of the raw data should be stored? Greg Smith: The problem is with too much information. The initial focus has to be on the derived data. Ed Law: I think it would be desirable to have the raw data there. Jeff Manion: There is a significant cost to entering data. Either you need to continuously dedicate people and money to data entry, which seems unlikely in the nearterm, or people have to do it for free. If the task is too onerous or timeconsuming it is not going to happen. Branko Ruscic: Once the data entry system is fully developed it may not be too difficult to collect future data. A big challenge is going to be the historical data because of the amount of effort it would require to collect and input.
Unknown: The type of data will determine whether or not it is worth collecting. For instance, it is probably not worth collecting models from even ten years ago, as they will have evolved and be past their useful life span. Wing Tsang: Although the question of support is nominally not on the table it is an issue which could determine the success or failure of the combustion cyberinfrastructure. It is possible that the Advanced Technology Program could be a source of funding. Also where should the database reside? Will and should the Combustion Institute take over PrIMe or whatever PrIMe evolves into?
Final open discussion: What should be the next steps? Setting up databases and software Manion: It’s time to get the programmers working. Golden: Plenty of our students are great programmers and go to work for financial houses and pharma companies, but we’re still focused on their doing combustion research in their studies. In general, we don’t want to direct them to focus on building a lot of different codes for databasing. Bellan: Students aren’t going to get degrees for inputting data, after all. [Like publishing, disseminating their data into a database might be an expected part of it, though.] Seshadri: Agencies need to get behind sponsoring just such databasedevelopment activities, though. Brezinsky: I need students who are expert in this, that, and the other – experiment, computation – and I’m going to get better value by having them put results into PrIMe. Others need to, and after initial experience, we need a “lessons learned” review opportunity. Frenklach: It is necessary to bring in both computersavvy users but also the informationscience, programmingfocused people. Ruscic: If you have the funding, you can make it happen; programmers cost money (e.g., tying Active Thermochemical Tables with PrIMe). What form of organization should be pursued? Lindstedt: Your problem is not uncommon. If cyberinfrastructure initiative is an opportunity to make such things happen, then we should, and five years from now, you will see the change. Fotache: In a similar situation, we created a standing committee of experts and
sponsors, then focus groups to find key points of dissatisfaction/needs. In the end, a pilot program of users, now institutionalized – and evolving. At least in some communities like PrIMe, you seem to have the critical set to do just that – and you have to. Sure, you have to envision a transition, like to the Combustion Institute, but you have to act. Law: Organize a roadmap for comprehensive experiments and cyberinfrastructure for combustion chemistry. Trouvé: In turbulence community, more integration is needed of people than of tools and of data, although those are needed, too. PrIMe might provide technology for some of it. Brown: It would the wrong approach to separate the fluidmechanics people from the chemistry people [completely]. Golden: From a cyberinfrastructure problem, the challenge is identifying/developing communityaccepted data model. Bellan: Very different problems, though; takes a day to download my data! Different models are apparently necessary. Ruscic: Need fastaction modes for developing responses as they are identified. Frenklach: With PrIMe, we have generated specialinterest task forces to deal with such diversity. For example, new group of Cybernumerics led by Phil Smith. What are the actionable items now? Tsang: We will plunge into using PrIMe for the next year. Santoro: With Mel Roquemore, we propose to put the SERDP Soot program into PrIMe. Tishkoff: If that’s the preference, can PrIMe accommodate it? Frenklach: Yes, but it is a matter of allocating our limited resources at this point, based on priorities. Lindstedt: With Rob Barlow, we’re working to migrate TNF into PrIMe. We’ve begun exploratory talks. Brezinsky: As I said this morning, I intend to go ahead and begin using PrIMe. This sort of digital sharing is where the world is moving, and we have to. There seem to be strong commitments to use PrIMe. What are other issues? Westmoreland: Two other issues came up in the last few days: (1) Research groups might like to submit data but aren’t ready for formal PrIMe community submission. There seems to be a solution already. Zoran Djurisic indicated that PrIMe can accommodate that need through private “rooms” allowing use of the PrIMe data model. (2) Another is the turbulentcombustion group’s need for a (few) standard physical
realization(s) for focusing experiments and modeling, analogous to TNF Flame D or to flat flames or shock tubes – the “unit problems.” Defining them requires organizational activity from that community by a focus group, probably meeting physically. Manion: Other needs seem apparent that are not yet met by PrIMe: (1) Information like the NIST Kinetics Database’s data collection is needed but is not available in PrIMe; likewise Don Burgess’s collection and visualization. Longterm commitment is still a necessary matter to deal with. If PrIMe takes over this function, NIST would not likely continue it. If PrIMe then goes away, then we’d be left worse off. Also, it might be privatized in an unacceptable way, (2) A cost issue is that the existing PrIMe code requires purchase of MATLAB, quite expensive outside academia. Frenklach: That is a timely question because we have a solution to this problem. As early as next week, we will make available a compiled, MATLABindependent version for distribution. It may have to be compiled for different platforms, but it will be initially available for PC’s. MATLABbased source code will still be available. Ruscic: Remember thermochemistry and its different needs from kinetics. Violi: What about depositories for model results? What about access to models themselves? Those features are necessary. How do we make sure “Discovery” is part of the activity? Violi: Consider NSF’s FY08 “CyberEnabled Discovery and Innovation” initiative. Make sure the Discovery part is in there. Ruscic: Consider Active Thermochemical Tables. Yes, it analyzes data, but it also points to the key experiments that are needed. Golden: “Discovery” in engineering goes beyond scientific discovery. It must include innovation and discovery of solutions to problems, too. Wang: Proper inclusion of uncertainty is critical to newexperiment design and to predictive engineering. Frenklach: One of the purposes of thorough modeling is discovery – of new chemical mechanisms. Chen: Remember need for critical physical properties as well. For example, we need surface tension of ammonia and were surprised at the difficulties of finding any data. We need to include the search for new findings there. How do we make sure practical applications can benefit? Chen: From the applications side, we’d like to explore the link between the application
utility and the scientific findings. The applications community needs to have input continuing input – into these scientific developments. For example, if we need high pressure results but only lowpressure results are available, we may use them and make a mistake by doing so. Tishkoff: Hukam Mongia (GE) is interested in identifying a set of “unit experiments” that would be useful configurations. A focus group of interested people could have a big impact on guiding such initiatives. Gillette: NASA has strong interest in identifying such unit problems. Bulzan: The question is whether those can be organized and disseminated more effectively [by using cyberinfrastructure]? Of course, there may be important commercial opportunities by making use of such data.