Extreme-Scale Computing Research and Development

FastForward 2 Request for Proposals

All proposals are due on or before May 9, 2014, 5:00 p.m. PDT

Interested Offerors are advised to monitor this website for potential FastForward 2 RFP amendments and other FastForward 2 RFP information updates. LLNS may notify interested Offerors (who have previously contacted LLNS and expressed an interest in the FastForward 2 RFP) of updated FastForward 2 RFP information via e-mail; however, LLNS is under no obligation to do so. It is the responsibility of all interested Offerors to monitor this website for current FastForward 2 RFP information.

Memory Technology Request for Proposal B608045

Contract Administrator: Brandt Esser
Phone: 925-423-1518
E-mail: esser3@llnl.gov

RFP Letter B608045
Node Architecture Request for Proposal B608115

Contract Administrator: Brandt Esser
Phone: 925-423-1518
E-mail: esser3@llnl.gov

RFP Letter B608115

Interested Offerors must submit all communication (questions, comments, etc.) about the FastForward 2 RFPs to the Contract Administrator.

The FastForward 2 RFP objective is to initiate partnerships with multiple companies to accelerate the R&D of critical technologies needed for extreme-scale computing. It is recognized that the broader computing market will drive innovation in a direction that may not meet DOE's mission needs. Many DOE applications place extreme requirements on computations, data movement, and reliability. FastForward 2 seeks to fund innovative new and/or accelerated R&D of technologies targeted for productization in the 2020–2023 timeframe.

FastForward 2 RFP Components

RFP Questions and Answers

Q1: Are there any page limitations for proposals in response to RFP Numbers B608045 and B608115?
A1: Technical proposals shall not exceed 25 pages in length, with a minimum font size of 11 points and margins no smaller than 1 inch on all sides. Timelines and milestone schedules are considered to be part of the 25 page limit. Bios of key investigators are not included in the page limit. A brief cover letter and table of contents are not included in the page limit.

Q2: What are the consequences of submitting a proposal that does not address all mandatory requirements?
A2: Proposals that do not address all mandatory requirements will not be considered.

Q3: The Statement of Work refers to a proxy app called TORCH in Section 6.4, page 15, second line from the bottom. The link connects to an old web page. What is the correct link?
A3: the correct link is http://crd.lbl.gov/groups-depts/ftg/projects/previous-projects/torch-testbed.

Q4: We have reference document that will not be available on any website until late summer and is far too large to include within our 25-page limit. Can we include reference materials of this type with our technical proposal?
A4: No. Adding a large unpublished reference constitutes an appendix to the proposal. The technical proposal should explain the proposed work and convince the reviewers that it is worthy of funding. Reference materials must be summarized within the 25-page limit.

Q5: At the end of SOW Section A1.2.6 is a sentence fragment: “Development of a target independent programming system.” Please explain.
A5: This is extraneous text and should be ignored.

Q6: Between now and the time funding decisions for FastForward 2 are made, is it acceptable for us to have discussions with people in the Department of Energy related to our current FastForward project (i.e., as part of our working group meetings or co-design activities) as long as we do not discuss FastForward 2?
A6: Yes.

Q7: What options are available if two companies want to work together on a FastForward 2 project? For example: (a) a joint proposal but with separate contract for each company, (b) one company leads and the other subcontracts, or (c) separate but coordinated proposals.
A7: Option (b).

Q8: Section 1 of the FastForward2 Draft Statement of Work states, "The Node Architecture focus area broadens the previous FastForward focus on Processors to include the entire architecture of a compute node. Both the node hardware and any necessary enabling software are in scope. A Node Architecture research proposal can also include several focus areas." Section 7.2 of the FastForward2 Draft Statement of Work states: "Offerors wishing to conduct research on Node Architecture that includes more than one focus area shall submit a single proposal that specifies the dependencies between the focus areas. The proposal budget shall clearly specify the budget for each focus area." What is meant by focus area? For example, are Energy Utilization and Component Integration separate focus areas?
A8: Node Architecture and Memory Technology are the two topics of this RFP. Work that spanned both topics would require two separate proposals. Energy Utilization and Component Integration are areas of interest. An Offeror proposing work in both areas of interest must submit one proposal with dependencies identified and budgets specified separately.

Q9: Section 8.4 indicates "three-year subcontract period of performance" but elsewhere the SOW indicates the contract is for 27 months. Which is correct?
A9: The correct period of performance is 27 months.

Q10: Section A1-1.4 and A1-2.4 deal with "On-Chip and Off-Chip Data Movement." However, the target requirement given in A1-5.4 is listed as "On-Chip Data Movement" (Off-Chip is not specified) and focuses on balancing memory bandwidth and capacity. Is this an error in the Draft Statement of Work? Are techniques related to off-chip data movement within scope if they occur within the node?
A10: The title of Section A1-5.4 should be "On-Chip and Off-Chip Data Movement." Both are in scope.

Q11: Sections A1-1.5 and A1-2.5 discuss concurrency, but concurrency is not listed as a target requirement. Is this an error in the Draft Statement of Work? Are techniques designed to handle concurrency within scope?
A11: Concurrency is in scope.

Q12: Can you clarify what is meant by the statement "While node architecture includes processing near memory, processing in memory components that independent of the integrated node technology are not in scope?"
A12: The intent of this statement is that processing in (or near) memory (PIM) technologies are in scope for a Node Architecture proposal if they are an integral part of the overall node architecture being proposed. Developing independent PIM components would be in scope for a Memory Technology proposal if the components could be integrated into multiple vendors' node architectures. PIM technology that is not proposed as part of an overall node architecture and that cannot be integrated into multiple vendors' nodes (i.e., proprietary components for an unspecified node) would be out of scope for this RFP.

Q13: What is meant by this metric: "Efficient operation as measured by a weighted sum of time and energy to solution, chosen to approximate the likely balance of capital and operating expenses for a node." Can you confirm that the following is correct? "This appears to be a simplified TCO analysis, e.g., [CAPEX per node/(usable lifespan of a node) + OPEX/unit-time] to get a cost per unit time.  Power is a proxy for OPEX. I think solution might mean an application running start-to-finish. Flops are a part of that but if it takes hours to reconfigure the machine to run a 5-minute job, that would need to be included."
A13: The purpose of this metric is to address the balance between computational performance and energy usage. An analysis that considered only the capital and operating expenses (which include energy-to-solution) and not application performance (time-to-solution) would not be responsive. Likewise, an analysis of application performance that disregarded capital and operating expenses would also be nonresponsive.

Q14: Mean Time Between Repair (TR-1). The Mean Time Between Repair (MTBR) for a single node should be greater than 30 years. Repair is required whenever the node functionality drops below the expected minimum level, necessitating operator service or part replacements. Can you confirm that the following definition is correct? "Is it acceptable for us to interpret TR-1 as when the system is in its expected operating life (say 10 years), the MTBR is 30 years?"
A14: The 30-year MTBR for a node is a statistical target that contributes to an acceptable overall MTBR for a system composed of hundreds or thousands of nodes. An Offeror may specify the expected service life during which a set of nodes will meet this target.

Q15: What amount of funding is available for the FastForward 2 contracts?
A15: The total amount of funding for FastForward 2 is expected to be $100M.

Q16: What is "Mean Time Between Interrupts" in the Draft SOW? In the FastForward Draft SOW, the target requirement was defined in terms of "mean time to user intervention" whereas it is unclear in the FF2 Draft SOW whether the term "interrupt" refers to user intervention or to any event in the system due to a fault (e.g., a checkpoint restore operation that is triggered due to a detected fault).
A16: As stated in Section 4.3, we are interested in improvements both in hardware reliability and in local mitigation that reduces the need for user intervention. The overall MBTI is the mean time between events that require immediate user intervention to restore full node capability.

Q17: For the NIC Integration TR, what are the assumptions behind 2B PGAS msgs/s? Specifically,

Also, is there an implicit distinction in the RFP between MPI and PGAS that is more properly couched as 2-sided and 1-sided communication? In other words, is the MPI messaging rate assumed to involve 2-sided communication as opposed to 1-sided for PGAS communication?
A17: The 2B PGAS message/second rate refers to data movement between nodes. The target can be achieved as an aggregate across multiple NICs on a node, but it does not include data movement within a node. Although efficient multicast is of interest, the target refers to point-to-point communication between nodes. Also, network performance is not a focus of FF2. The purpose of this TR is to ensure that the node architecture can handle network traffic at these rates. Finally, regarding the PGAS/MPI distinction, we expect the PGAS programming model to produce more message traffic than MPI. The distinction is not so much 1- vs. 2-sided communication; it’s more about the demands on internode communication.

Q18: The SOW (in Section 8.5.1) states "All lead and key personnel should be identified by name and brief CV’s for these personnel should be provided." Should both the names and CVs be provided as part of the Technical Proposal or as a separate document?
A18: The technical team will review the "The expertise and skill level of key Offeror personnel." In order to do this, the reviewers must have the CVs. Please include them with your technical proposal. They don’t fall within the 25 page limit.

Q19: One of the performance metrics for the Node Architecture is "Energy per bit for data transfers." Can you please explain from where and to where the data is being transferred for this metric? For example, is this from memory to a register on the processor chip?
A19: Improvements in the energy per bit for data transfers between any subsystems within a node are of interest. Transfers from memory to a register would be one example, as would transfers between other levels of the memory hierarchy or transfers between memory and the network interface. This MR calls for Offerors to address "relevant" metrics, so a responsive proposal need not quantify improvements to the energy per bit for data transfers between every possible pair subsystems if such improvements are not presented as a benefit of the proposed design.


This page last modified on May 5, 2014

Privacy & Legal Notice