All posts by

Leona Carlyle-Kakar (Ak-Chin), Instrumental in Securing the 1st Water Rights Settlement in Indian Country, Walks On

Leona Carlyle-Kakar, a prominent member of the Ak-Chin Indian Community, has passed on to the spirit world. She was instrumental in securing one of the most substantial water settlements for Indian Country and spurring the economic development of her tribe.

To build a better AI helper, start by modeling the irrational behavior of humans

To build AI systems that can collaborate effectively with humans, it helps to have a good model of human behavior to start with. But humans tend to behave suboptimally when making decisions.

This irrationality, which is especially difficult to model, often boils down to computational constraints. A human can’t spend decades thinking about the ideal solution to a single problem.

Researchers at MIT and the University of Washington developed a way to model the behavior of an agent, whether human or machine, that accounts for the unknown computational constraints that may hamper the agent’s problem-solving abilities.

Their model can automatically infer an agent’s computational constraints by seeing just a few traces of their previous actions. The result, an agent’s so-called “inference budget,” can be used to predict that agent’s future behavior.

In a new paper, the researchers demonstrate how their method can be used to infer someone’s navigation goals from prior routes and to predict players’ subsequent moves in chess matches. Their technique matches or outperforms another popular method for modeling this type of decision-making.

Ultimately, this work could help scientists teach AI systems how humans behave, which could enable these systems to respond better to their human collaborators. Being able to understand a human’s behavior, and then to infer their goals from that behavior, could make an AI assistant much more useful, says Athul Paul Jacob, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique.

“If we know that a human is about to make a mistake, having seen how they have behaved before, the AI agent could step in and offer a better way to do it. Or the agent could adapt to the weaknesses that its human collaborators have. Being able to model human behavior is an important step toward building an AI agent that can actually help that human,” he says.

Jacob wrote the paper with Abhishek Gupta, assistant professor at the University of Washington, and senior author Jacob Andreas, associate professor in EECS and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the International Conference on Learning Representations.

Modeling behavior

Researchers have been building computational models of human behavior for decades. Many prior approaches try to account for suboptimal decision-making by adding noise to the model. Instead of the agent always choosing the correct option, the model might have that agent make the correct choice 95 percent of the time.

However, these methods can fail to capture the fact that humans do not always behave suboptimally in the same way.

Others at MIT have also studied more effective ways to plan and infer goals in the face of suboptimal decision-making.

To build their model, Jacob and his collaborators drew inspiration from prior studies of chess players. They noticed that players took less time to think before acting when making simple moves and that stronger players tended to spend more time planning than weaker ones in challenging matches.

“At the end of the day, we saw that the depth of the planning, or how long someone thinks about the problem, is a really good proxy of how humans behave,” Jacob says.

They built a framework that could infer an agent’s depth of planning from prior actions and use that information to model the agent’s decision-making process.

The first step in their method involves running an algorithm for a set amount of time to solve the problem being studied. For instance, if they are studying a chess match, they might let the chess-playing algorithm run for a certain number of steps. At the end, the researchers can see the decisions the algorithm made at each step.

Their model compares these decisions to the behaviors of an agent solving the same problem. It will align the agent’s decisions with the algorithm’s decisions and identify the step where the agent stopped planning.

From this, the model can determine the agent’s inference budget, or how long that agent will plan for this problem. It can use the inference budget to predict how that agent would react when solving a similar problem.

An interpretable solution

This method can be very efficient because the researchers can access the full set of decisions made by the problem-solving algorithm without doing any extra work. This framework could also be applied to any problem that can be solved with a particular class of algorithms.

“For me, the most striking thing was the fact that this inference budget is very interpretable. It is saying tougher problems require more planning or being a strong player means planning for longer. When we first set out to do this, we didn’t think that our algorithm would be able to pick up on those behaviors naturally,” Jacob says.

The researchers tested their approach in three different modeling tasks: inferring navigation goals from previous routes, guessing someone’s communicative intent from their verbal cues, and predicting subsequent moves in human-human chess matches.

Their method either matched or outperformed a popular alternative in each experiment. Moreover, the researchers saw that their model of human behavior matched up well with measures of player skill (in chess matches) and task difficulty.

Moving forward, the researchers want to use this approach to model the planning process in other domains, such as reinforcement learning (a trial-and-error method commonly used in robotics). In the long run, they intend to keep building on this work toward the larger goal of developing more effective AI collaborators.

This work was supported, in part, by the MIT Schwarzman College of Computing Artificial Intelligence for Augmentation and Productivity program and the National Science Foundation.

Navy’s Single Sub Buy Plan Raises Concerns with Congress

A Virginia-class under construction at General Dynamic Electric Boat. EB Photo

Lawmakers raised concerns with Navy leaders this week over lags in U.S. naval shipbuilding after the service altered plans for attack boat procurement for the first time in more than a decade.Seeking to adhere to the Fiscal Responsibility Act of 2023 that capped defense spending, the Navy released a Fiscal Year 2025 budget that wants to buy six battle force ships and decommission 19. In a break with more than a decade of procurement, the service elected to buy one Virginia-class submarine instead of two.

Lawmakers have criticized that decision and voiced concern over what the decreased buy means for the submarine industrial base.

Industry is currently building Virginias at a rate of about 1.3 boats per year. Navy officials argue that the backlog at General Dynamics Electric Boat in Connecticut and HII’s Newport News Shipbuilding in Virginia justifies the decision to buy one attack boat this year and that a multi-billion dollar funding injection into the submarine industrial base will prevent disruptions to suppliers.

During a House Armed Services seapower and projection forces subcommittee hearing on Wednesday, ranking member Rep. Joe Courtney (D-Conn.) contested the Navy’s assumptions and argued the purchase reduction would take $1 billion out of the program, with limited benefits to many companies in the industrial base.

The Navy is asking for $7.3 billion for the Virginia-class program, with $3.6 billion for the one submarine and $3.7 billion in advanced procurement funding for submarines it plans to buy in FY 2026 and FY 2027. In an exchange with Navy acquisition chief Nickolas Guertin, Courtney pointed to submarine suppliers that wouldn’t benefit from advanced procurement funding.

“Not every supply chain company that feeds into the submarine program is an advanced procurement long lead supply company,” Courtney said. “A lot of the other pieces of the submarine that are funded through just the full funding of any vessel may not be in that bucket of advanced procurement.”

Asked what happens to the companies that do not contribute to long lead items, Guertin told Courtney the service had to make “difficult trades” but cited the advanced procurement money for boats in later years.

“The supply chain actors that are perhaps not in that same category, we’ll have to take a closer look at that one and see what we can do to help that aspect of the industry as well,” Guertin said.

Courtney issued a three-page memo about the submarine industrial base to his HASC colleagues before the Wednesday hearing this week. In that memo, Courtney cited the consequences of the decreased attack boat buy, including almost $1 billion of “sunk costs” from advanced procurement funding the Navy received in FY 2023 and FY 2024.

“The Navy has been unable to demonstrate how these funds will be used now that they have reduced procurement from two submarines in FY25 down to one,” Courtney wrote in the memo.
“Either way, by not using the already-purchased parts on the second FY25 boat, the Navy is not only sinking costs, but is creating uncertainty in the industrial base which will have dramatic effects to the number of domestic suppliers and their workforce as they consider if they have a stable, strong demand signal to make capital investments to increase capacity.”

Speaking to reporters before the hearing, Courtney said money for the submarine industrial base projects might not benefit the narrow bench of suppliers that need assistance the most.

“If you look at the SIB investment and break it down, the fact is a significant portion of that is not going into submarine production,” he said.

“There’s definitely money going into submarine maintenance, which is a good thing. There’s no question that that’s been neglected for a really long time. But even the portion that is allocated for supply chain companies, it’s not going to be like peanut butter being spread across the whole universe of supply chain companies. There are supply chain companies, which don’t fall into the SIB formula for assistance. So for them, they’re basically looking at a 50 percent cut.”

The Navy has not detailed how it plans to use the more than $17 billion SIB funding planned for the next five years across the maintenance, new production and public shipyard infrastructure improvement. Some of the money marked for the SIB is for projects like updating the drydocks at the four public shipyards as part of the Shipyard Infrastructure Optimization Plan, or SIOP.

“What’s the right strategy here?” Courtney said rhetorically to reporters earlier this week.
“What the Navy has embraced is a cut procurement strategy and I think that’s the wrong way to go.”

“We’re seeing boats that are coming in late, but they’re coming in and they’re coming in with high-quality scores on their sea trials,” he added.

Courtney said he was in touch with Australian officials about how the one submarine purchased in FY 2025 could affect the tripartite submarine agreement between the U.K., the U.S. and Australia. To meet the Australian demand for three to five Virginia-class attack boats and also build the Columbia-class ballistic missile submarine, Newport News and Electric Boat would have to build at a rate of 2.33 Virginias per year.

“They’re still very much committed to AUKUS and they’re watching this very closely. I can tell you that they have also been talking to the Navy and at this point this has not changed their decisions to make investments in the U.S. industrial base,” Courtney said to reporters.
“Which is an amazing decision on their part. But certainly, the media noise in Australia viewed [the one Virginia buy in 2025] as a potential obstacle for AUKUS.”

Subcommittee chairman Rep. Trent Kelly (R-Miss.) said cutting an attack submarine will not allow the U.S. to get to the build rate it needs to support the Australians.

“Sending a strong demand signal is essential for maintaining industrial capabilities and international confidence in our naval production,” Kelly said in his opening statement. “Reducing our order from two boats to one in this year’s budget does not inspire confidence, nor does it forecast stability to our industry partners and allies.”

Some lawmakers, like Senate Appropriations Committee defense subcommittee chairman Sen. Jon Tester (D-Mont.), echoed Navy Secretary Carlos Del Toro’s criticisms of industry over the last few months.

“Secretary Del Toro, you had a 45-day review of Navy shipbuilding and documented a three year delay in the new frigate program, a two-year delay in the latest aircraft carrier, two to three year delay in Virginia-class attack submarines, a 12 to 16 month delay in the Columbia-class submarine This is despite unprecedented support by Congress,” Tester said Tuesday during his subcommittee’s hearing.

“Headlines and delays like this should constitute a full-blown emergency for the Navy and shipbuilding industry. I know there are no quick fixes, but I have yet to see what either party is prepared to do about it. And to be specific, who’s being held accountable?”

Del Toro referred to the buybacks during his testimony before Tester’s subcommittee on Tuesday.

“Many of these shipyards are actually making stock buyback programs, where they’re buying back stocks to the tune of $4 billion and upwards of that, that’s unforgivable. And we need them to invest in their own shipyards, just as much as the American taxpayer is investing in the budget to get these production lines back up,” Del Toro said.

According to data provided to USNI News, the seven major U.S. naval shipyards have invested about $5 billion in capital improvements over the last five years.

During the HASC seapower hearing, Kelly said it’s “unclear exactly what has been achieved” through the 45-day review.

“What is clear is we need a deeper level of self-reflection from the Navy,” Kelly said.

Using deep learning to image the Earth’s planetary boundary layer

Although the troposphere is often thought of as the closest layer of the atmosphere to the Earth’s surface, the planetary boundary layer (PBL) — the lowest layer of the troposphere — is actually the part that most significantly influences weather near the surface. In the 2018 planetary science decadal survey, the PBL was raised as an important scientific issue that has the potential to enhance storm forecasting and improve climate projections.  

“The PBL is where the surface interacts with the atmosphere, including exchanges of moisture and heat that help lead to severe weather and a changing climate,” says Adam Milstein, a technical staff member in Lincoln Laboratory’s Applied Space Systems Group. “The PBL is also where humans live, and the turbulent movement of aerosols throughout the PBL is important for air quality that influences human health.” 

Although vital for studying weather and climate, important features of the PBL, such as its height, are difficult to resolve with current technology. In the past four years, Lincoln Laboratory staff have been studying the PBL, focusing on two different tasks: using machine learning to make 3D-scanned profiles of the atmosphere, and resolving the vertical structure of the atmosphere more clearly in order to better predict droughts.  

This PBL-focused research effort builds on more than a decade of related work on fast, operational neural network algorithms developed by Lincoln Laboratory for NASA missions. These missions include the Time-Resolved Observations of Precipitation structure and storm Intensity with a Constellation of Smallsats (TROPICS) mission as well as Aqua, a satellite that collects data about Earth’s water cycle and observes variables such as ocean temperature, precipitation, and water vapor in the atmosphere. These algorithms retrieve temperature and humidity from the satellite instrument data and have been shown to significantly improve the accuracy and usable global coverage of the observations over previous approaches. For TROPICS, the algorithms help retrieve data that are used to characterize a storm’s rapidly evolving structures in near-real time, and for Aqua, it has helped increase forecasting models, drought monitoring, and fire prediction. 

These operational algorithms for TROPICS and Aqua are based on classic “shallow” neural networks to maximize speed and simplicity, creating a one-dimensional vertical profile for each spectral measurement collected by the instrument over each location. While this approach has improved observations of the atmosphere down to the surface overall, including the PBL, laboratory staff determined that newer “deep” learning techniques that treat the atmosphere over a region of interest as a three-dimensional image are needed to improve PBL details further.

“We hypothesized that deep learning and artificial intelligence (AI) techniques could improve on current approaches by incorporating a better statistical representation of 3D temperature and humidity imagery of the atmosphere into the solutions,” Milstein says. “But it took a while to figure out how to create the best dataset — a mix of real and simulated data; we needed to prepare to train these techniques.”

The team collaborated with Joseph Santanello of the NASA Goddard Space Flight Center and William Blackwell, also of the Applied Space Systems Group, in a recent NASA-funded effort showing that these retrieval algorithms can improve PBL detail, including more accurate determination of the PBL height than the previous state of the art. 

While improved knowledge of the PBL is broadly useful for increasing understanding of climate and weather, one key application is prediction of droughts. According to a Global Drought Snapshot report released last year, droughts are a pressing planetary issue that the global community needs to address. Lack of humidity near the surface, specifically at the level of the PBL, is the leading indicator of drought. While previous studies using remote-sensing techniques have examined the humidity of soil to determine drought risk, studying the atmosphere can help predict when droughts will happen.  

In an effort funded by Lincoln Laboratory’s Climate Change Initiative, Milstein, along with laboratory staff member Michael Pieper, are working with scientists at NASA’s Jet Propulsion Laboratory (JPL) to use neural network techniques to improve drought prediction over the continental United States. While the work builds off of existing operational work JPL has done incorporating (in part) the laboratory’s operational “shallow” neural network approach for Aqua, the team believes that this work and the PBL-focused deep learning research work can be combined to further improve the accuracy of drought prediction. 

“Lincoln Laboratory has been working with NASA for more than a decade on neural network algorithms for estimating temperature and humidity in the atmosphere from space-borne infrared and microwave instruments, including those on the Aqua spacecraft,” Milstein says. “Over that time, we have learned a lot about this problem by working with the science community, including learning about what scientific challenges remain. Our long experience working on this type of remote sensing with NASA scientists, as well as our experience with using neural network techniques, gave us a unique perspective.”

According to Milstein, the next step for this project is to compare the deep learning results to datasets from the National Oceanic and Atmospheric Administration, NASA, and the Department of Energy collected directly in the PBL using radiosondes, a type of instrument flown on a weather balloon. “These direct measurements can be considered a kind of ‘ground truth’ to quantify the accuracy of the techniques we have developed,” Milstein says.

This improved neural network approach holds promise to demonstrate drought prediction that can exceed the capabilities of existing indicators, Milstein says, and to be a tool that scientists can rely on for decades to come.

Ukraine’s Experience in Developing Lethal Drones Should Be Lesson for NATO, Says Panel

Bayraktar TB2 of Ukrainian Air Force. Ministry of Defense of Photo

Kyiv’s ability to rapidly create and field software to a homegrown community of unmanned aerial vehicles specialists is a lesson every NATO nation should take away from the war in Ukraine, the coauthor of a UAV effectiveness study said.

Speaking Tuesday at the Wilson Center, Jack Watling, a senior research fellow at the Royal United Services Institute, said Ukraine, using specialists from its tech sector, has been able to make adaptations “to keep [combat UAVs] effective.”

This time frame can be as short as two weeks, especially if the Russians have downed the UAV and began to adapt their countermeasures to the technology, he added. But the Ukrainians have airframes that allow “rapid insertion of modular [software] change,” he said.

If the software isn’t changed, the paper added, “between six and 12 weeks, the adversary would have gathered sufficient data on the waveforms and techniques being used to start effectively jamming and/or spoofing the system across the front. If a new UAV control technique is used near to a specialized counter-UAV EW asset, such as the Russian Shipovnik-Aero, then the process of enemy adaptation is significantly faster – typically around two weeks.”

While the speed of getting the technology to the warfighter is there, the drawback is scale. Ukraine does not have the industrial capacity to build UAVs in the numbers that the Kremlin can. Moscow, however, struggles with a much slower development-to-fielding process,

Watling said NATO’s militaries likewise cannot make lightning-fast changes either.
He noted that as early as 2020, the war between Armenia and Azerbaijan demonstrated a lesson every military should understand: “If you don’t prepare for [drone’s impact from surveillance to long-range strike], it will go badly for you.” Azerbaijan’s heavy investment in combat drones “allowed them to conduct precision strikes – saving their ground-force strength while inflicting serious damage and casualties against the Armenian-backed forces,” according to an article in the January 2022 issue of U.S. Naval Institute Proceedings.

At the online Wilson Center event, Watling said low-cost and often disposable, “UAVS tend to complement artillery, ISR [intelligence, reconnaissance and surveillance]” and existing platforms. He said the impact they have had on counterbattery artillery fire in Ukraine is significant.

Other UAVs for lingering ISR at altitudes between 15,000 and 30,000 feet and for precision strike are not cheap or easily replaced.

As Watling’s and Justin Bronk’s paper “Mass Precision Strike” concludes, “UAV effectiveness is ultimately dependent upon their interaction with artillery, electronic warfare, air defense and other force elements. UAVs may redistribute the balance of missions assigned to different systems, but they do not eliminate the requirement for traditional artillery.”

The cost of UAVs can vary widely, from “toys costing $150” to hundreds of thousands of dollars for long-range precision strike, Watling added.

Another key difference between Ukraine and NATO in drone warfare is Kyiv’s reliance on “a community of specialists” for planning, operating and maintaining drones rather than wide distribution among the force for more complicated operations.

In their paper, Watling and Bronk wrote, “mass precision strike should be managed by a specialist formation. This is not only because of the significant improvement in effectiveness achievable with skilled mission planning. Experience from contemporary theatres shows that almost all UAV capabilities are highly susceptible to hard counters as the adversary learns how the UAV functions; capabilities must therefore be continuously adapted and their supporting mission data files updated. This requires scarce skills such as UAV design and programming and the accumulation of data centrally. It therefore makes sense to concentrate UAV operation if UAVs are parts of a mass precision strike complex.”
Watling said these specialists also may lack basic military skills used by land forces. He added their specialist skills “are fairly scarce” in western militaries because their UAVs software may need to be changed in 48 hours. NATO UAV operations aren’t calibrated for that kind of speedy turnaround.

That speed underlines another fundamental difference in how Kyiv treats changes to UAVS during combat and how NATO regards them during peacetime. The alliance “regards them as aircraft, and aircraft standards are fairly rigorous” regarding training, operations and maintenance for many reasons, primarily safety on the ground and in the air.

Each change, he said, now requires recertification. “The speed of relevance” can disappear through the long process that would have to be constantly redone for every iteration.
Watling added these differences must be addressed legislatively and through regulations. The UAV industry also needs to examine its practices in light of what would be needed when drones with a host of capabilities are in combat.

“We need to get after that. We need to come up with good control mechanisms.”

Advancing technology for aquaculture

According to the National Oceanic and Atmospheric Administration, aquaculture in the United States represents a $1.5 billion industry annually. Like land-based farming, shellfish aquaculture requires healthy seed production in order to maintain a sustainable industry. Aquaculture hatchery production of shellfish larvae — seeds — requires close monitoring to track mortality rates and assess health from the earliest stages of life. 

Careful observation is necessary to inform production scheduling, determine effects of naturally occurring harmful bacteria, and ensure sustainable seed production. This is an essential step for shellfish hatcheries but is currently a time-consuming manual process prone to human error. 

With funding from MIT’s Abdul Latif Jameel Water and Food Systems Lab (J-WAFS), MIT Sea Grant is working with Associate Professor Otto Cordero of the MIT Department of Civil and Environmental Engineering, Professor Taskin Padir and Research Scientist Mark Zolotas at the Northeastern University Institute for Experiential Robotics, and others at the Aquaculture Research Corporation (ARC), and the Cape Cod Commercial Fishermen’s Alliance, to advance technology for the aquaculture industry. Located on Cape Cod, ARC is a leading shellfish hatchery, farm, and wholesaler that plays a vital role in providing high-quality shellfish seed to local and regional growers.

Two MIT students have joined the effort this semester, working with Robert Vincent, MIT Sea Grant’s assistant director of advisory services, through the Undergraduate Research Opportunities Program (UROP). 

First-year student Unyime Usua and sophomore Santiago Borrego are using microscopy images of shellfish seed from ARC to train machine learning algorithms that will help automate the identification and counting process. The resulting user-friendly image recognition tool aims to aid aquaculturists in differentiating and counting healthy, unhealthy, and dead shellfish larvae, improving accuracy and reducing time and effort.

Vincent explains that AI is a powerful tool for environmental science that enables researchers, industry, and resource managers to address challenges that have long been pinch points for accurate data collection, analysis, predictions, and streamlining processes. “Funding support from programs like J-WAFS enable us to tackle these problems head-on,” he says. 

ARC faces challenges with manually quantifying larvae classes, an important step in their seed production process. “When larvae are in their growing stages they are constantly being sized and counted,” explains Cheryl James, ARC larval/juvenile production manager. “This process is critical to encourage optimal growth and strengthen the population.” 

Developing an automated identification and counting system will help to improve this step in the production process with time and cost benefits. “This is not an easy task,” says Vincent, “but with the guidance of Dr. Zolotas at the Northeastern University Institute for Experiential Robotics and the work of the UROP students, we have made solid progress.” 

The UROP program benefits both researchers and students. Involving MIT UROP students in developing these types of systems provides insights into AI applications that they might not have considered, providing opportunities to explore, learn, and apply themselves while contributing to solving real challenges.

Borrego saw this project as an opportunity to apply what he’d learned in class 6.390 (Introduction to Machine Learning) to a real-world issue. “I was starting to form an idea of how computers can see images and extract information from them,” he says. “I wanted to keep exploring that.”

Usua decided to pursue the project because of the direct industry impacts it could have. “I’m pretty interested in seeing how we can utilize machine learning to make people’s lives easier. We are using AI to help biologists make this counting and identification process easier.” While Usua wasn’t familiar with aquaculture before starting this project, she explains, “Just hearing about the hatcheries that Dr. Vincent was telling us about, it was unfortunate that not a lot of people know what’s going on and the problems that they’re facing.”

On Cape Cod alone, aquaculture is an $18 million per year industry. But the Massachusetts Division of Marine Fisheries estimates that hatcheries are only able to meet 70–80 percent of seed demand annually, which impacts local growers and economies. Through this project, the partners aim to develop technology that will increase seed production, advance industry capabilities, and help understand and improve the hatchery microbiome.

Borrego explains the initial challenge of having limited data to work with. “Starting out, we had to go through and label all of the data, but going through that process helped me learn a lot.” In true MIT fashion, he shares his takeaway from the project: “Try to get the best out of what you’re given with the data you have to work with. You’re going to have to adapt and change your strategies depending on what you have.”

Usua describes her experience going through the research process, communicating in a team, and deciding what approaches to take. “Research is a difficult and long process, but there is a lot to gain from it because it teaches you to look for things on your own and find your own solutions to problems.”

In addition to increasing seed production and reducing the human labor required in the hatchery process, the collaborators expect this project to contribute to cost savings and technology integration to support one of the most underserved industries in the United States. 

Borrego and Usua both plan to continue their work for a second semester with MIT Sea Grant. Borrego is interested in learning more about how technology can be used to protect the environment and wildlife. Usua says she hopes to explore more projects related to aquaculture. “It seems like there’s an infinite amount of ways to tackle these issues.”

Generate customized, compliant application IaC scripts for AWS Landing Zone using Amazon Bedrock

Migrating to the cloud is an essential step for modern organizations aiming to capitalize on the flexibility and scale of cloud resources. Tools like Terraform and AWS CloudFormation are pivotal for such transitions, offering infrastructure as code (IaC) capabilities that define and manage complex cloud environments with precision. However, despite its benefits, IaC’s learning curve, and the complexity of adhering to your organization’s and industry-specific compliance and security standards, could slow down your cloud adoption journey. Organizations typically counter these hurdles by investing in extensive training programs or hiring specialized personnel, which often leads to increased costs and delayed migration timelines.

Generative artificial intelligence (AI) with Amazon Bedrock directly addresses these challenges. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon with a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Amazon Bedrock empowers teams to generate Terraform and CloudFormation scripts that are custom fitted to organizational needs while seamlessly integrating compliance and security best practices. Traditionally, cloud engineers learning IaC would manually sift through documentation and best practices to write compliant IaC scripts. With Amazon Bedrock, teams can input high-level architectural descriptions and use generative AI to generate a baseline configuration of Terraform scripts. These generated scripts are tailored to meet your organization’s unique requirements while conforming to industry standards for security and compliance. These scripts serve as a foundational starting point, requiring further refinement and validation to make sure they meet production-level standards.

This solution not only accelerates the migration process but also provides a standardized and secure cloud infrastructure. Additionally, it offers beginner cloud engineers initial script drafts as standard templates to build upon, facilitating their IaC learning journey.

As you navigate the complexities of cloud migration, the need for a structured, secure, and compliant environment is paramount. AWS Landing Zone addresses this need by offering a standardized approach to deploying AWS resources. This makes sure your cloud foundation is built according to AWS best practices from the start. With AWS Landing Zone, you eliminate the guesswork in security configurations, resource provisioning, and account management. It’s particularly beneficial for organizations looking to scale without compromising on governance or control, providing a clear path to a robust and efficient cloud setup.

In this post, we show you how to generate customized, compliant IaC scripts for AWS Landing Zone using Amazon Bedrock.

AWS Landing Zone architecture in the context of cloud migration

AWS Landing Zone can help you set up a secure, multi-account AWS environment based on AWS best practices. It provides a baseline environment to get started with a multi-account architecture, automate the setup of new accounts, and centralize compliance, security, and identity management. The following is an example of a customized Terraform-based AWS Landing Zone solution, in which each application resides in its own AWS account.

The high-level workflow includes the following components:

Module provisioning – Different platform teams across various domains, such as databases, containers, data management, networking, and security, develop and publish certified or custom modules. These are delivered through pipelines to a Terraform private module registry, which is maintained by the organization for consistency and standardization.
Account vending machine layer – The account vending machine (AVM) layer uses either AWS Control Tower, AWS Account Factory for Terraform (AFT), or a custom landing zone solution to vend accounts. In this post, we refer to these solutions collectively as the AVM layer. When application owners submit a request to the AVM layer, it processes the input parameters from the request to provision a target AWS account. This account is then provisioned with tailored infrastructure components through AVM customizations, which include AWS Control Tower customizations or AFT customizations.
Application infrastructure layer – In this layer, application teams deploy their infrastructure components into the provisioned AWS accounts. This is achieved by writing Terraform code within an application-specific repository. The Terraform code calls upon the modules previously published to the Terraform private registry by the platform teams.

Overcoming on-premises IaC migration challenges with generative AI

Teams maintaining on-premises applications often encounter a learning curve with Terraform, a key tool for IaC in AWS environments. This skill gap can be a significant hurdle in cloud migration efforts. Amazon Bedrock, with its generative AI capabilities, plays an essential role in mitigating this challenge. It facilitates the automation of Terraform code creation for the application infrastructure layer, empowering teams with limited Terraform experience to make an efficient transition to AWS.

Amazon Bedrock generates Terraform code from architectural descriptions. The generated code is custom and standardized based on organizational best practices, security, and regulatory guidelines. This standardization is made possible by using advanced prompts in conjunction with Knowledge Bases for Amazon Bedrock, which stores information on organization-specific Terraform modules. This solution uses Retrieval Augmented Generation (RAG) to enrich the input prompt to Amazon Bedrock with details from the knowledge base, making sure the output Terraform configuration and README contents are compliant with your organization’s Terraform best practices and guidelines.

The following diagram illustrates this architecture.

The workflow consists of the following steps:

The process begins with account vending, where application owners submit a request for a new AWS account. This invokes the AVM, which processes the request parameters to provision the target AWS account.
An architecture description for an application slated for migration is passed as one of the inputs to the AVM layer.
After the account is provisioned, AVM customizations are applied. This can include AWS Control Tower customizations or AFT customizations that set up the account with the necessary infrastructure components and configurations in line with organizational policies.
In parallel, the AVM layer invokes a Lambda function to generate Terraform code. This function enriches the architecture description with a customized prompt, and utilizes RAG to further enhance the prompt with organization-specific coding guidelines from the Knowledge Base for Bedrock. This Knowledge Base includes tailored best practices, security guardrails, and guidelines specific to the organization. See an illustrative example of organization specific Terraform module specifications and  guidelines uploaded to the Knowledge Base.
Before deployment, the initial draft of the Terraform code is thoroughly reviewed by cloud engineers or an automated code review system to confirm that it meets all technical and compliance standards.
The reviewed and updated Terraform scripts are then used to deploy infrastructure components into the newly provisioned AWS account, setting up compute, storage, and networking resources required for the application.

Solution overview

The AWS Landing Zone deployment uses a Lambda function for generating Terraform scripts from architectural inputs. This function, which is central to the operation, translates these inputs into compliant code, using Amazon Bedrock and Knowledge Bases for Amazon Bedrock. The output is then stored in a GitHub repository, corresponding to the specific application in migration. The following sections detail the prerequisites and specific steps needed to implement this solution.

Prerequisites

You should have the following:

An AWS account with appropriate AWS Identity and Access Management (IAM) permissions to create Lambda functions, Amazon Simple Storage Service (Amazon S3) buckets, IAM roles, and other resources.
A GitHub account with a repository to store the generated Terraform scripts.
Access to Amazon Bedrock models. For more information, refer to Model access.

Configure the Lambda function to generate custom code

This Lambda function is a key component in automating the creation of customized, compliant Terraform configurations for AWS services. It commits the generated configurations directly to a designated GitHub repository, aligning with organizational best practices. For the function code, refer to the following GitHub repo. For creating lambda function, please follow instructions.

The following diagram illustrates the workflow of the function.

The workflow includes the following steps:

The function is invoked by an event from the AVM layer, containing the architecture description.
The function retrieves and uses Terraform module definitions from the knowledge base.
The function invokes the Amazon Bedrock model twice, following recommended prompt engineering guidelines. The function applies RAG to enrich the input prompt with the Terraform module information, making sure the output code meets organizational best practices.

First, generate Terraform configurations following organizational coding guidelines and include Terraform module details from the knowledge base. For example, the prompt could be: “Generate Terraform configurations for AWS services. Follow security best practices by using IAM roles and least privilege permissions. Include all necessary parameters, with default values. Add comments explaining the overall architecture and the purpose of each resource.”
Second, create a detailed README file. For example: “Generate a detailed README for the Terraform configuration based on AWS services. Include sections on security improvements, cost optimization tips following the AWS Well-Architected Framework. Also, include detailed Cost Breakdown for each AWS service used with hourly rates and total daily and monthly costs.”

It commits the generated Terraform configuration and the README to the GitHub repository, providing traceability and transparency.
Lastly, it responds with success, including URLs to the committed GitHub files, or returns detailed error information for troubleshooting.

Configure Knowledge Bases for Amazon Bedrock

Follow these steps to set up your knowledge base in Amazon Bedrock:

On the Amazon Bedrock console, choose Knowledge base in the navigation pane.
Choose Create knowledge base.
Enter a clear and descriptive name that reflects the purpose of your knowledge base, such as AWS Account Setup Knowledge Base For Amazon Bedrock.
Assign a pre-configured IAM role with the necessary permissions. It’s typically best to let Amazon Bedrock create this role for you to make sure it has the correct permissions.
Upload a JSON file to an S3 bucket with encryption enabled for security. This file should contain a structured list of AWS services and Terraform modules. For the JSON structure, use the following example from the GitHub repository.
Choose the default embeddings model.
Allow Amazon Bedrock to create and manage the vector store for you in Amazon OpenSearch Service.
Review the information for accuracy. Pay special attention to the S3 bucket URI and IAM role details.
Create your knowledge base.

After you deploy and configure these components, when your AWS Landing Zone solution invokes the Lambda function, the following files are generated:

A Terraform configuration file – This file specifies the infrastructure setup.
A comprehensive README file – This file documents the security standards embedded within the code, confirming that they align with the security practices outlined in the initial sections. Additionally, this README includes an architectural summary, cost optimization tips, and a detailed cost breakdown for the resources described in the Terraform configuration.

The following screenshot shows an example of the Terraform configuration file.

The following screenshot shows an example of the README file.

Clean up

Complete the following steps to clean up your resources:

Delete the Lambda function if it’s no longer required.
Empty and delete the S3 bucket used for Terraform state storage.
Remove the generated Terraform scripts and README file from the GitHub repo.
Delete the knowledge base if it’s no longer needed.

Conclusion

The generative AI capabilities of Amazon Bedrock not only streamline the creation of compliant Terraform scripts for AWS deployments, but also act as a pivotal learning aid for beginner cloud engineers transitioning on-premises applications to AWS. This approach accelerates the cloud migration process and helps you adhere to best practices. You can also use the solution to provide value after the migration, enhancing daily operations such as ongoing infrastructure and cost optimization. Although we primarily focused on Terraform in this post, these principles can also enhance your AWS CloudFormation deployments, providing a versatile solution for your infrastructure needs.

Ready to simplify your cloud migration process with generative AI in Amazon Bedrock? Begin by exploring the Amazon Bedrock User Guide to understand how it can streamline your organization’s cloud journey. For further assistance and expertise, consider using AWS Professional Services to help you streamline your cloud migration journey and maximize the benefits of Amazon Bedrock.

Unlock the potential for rapid, secure, and efficient cloud adoption with Amazon Bedrock. Take the first step today and discover how it can enhance your organization’s cloud transformation endeavors.

About the Author

Ebbey Thomas specializes in strategizing and developing custom AWS Landing Zone resources with a focus on using generative AI to enhance cloud infrastructure automation. In his role at AWS Professional Services, Ebbey’s expertise is central to architecting solutions that streamline cloud adoption, providing a secure and efficient operational framework for AWS users. He is known for his innovative approach to cloud challenges and his commitment to driving forward the capabilities of cloud services.

Live Meeting Assistant with Amazon Transcribe, Amazon Bedrock, and Knowledge Bases for Amazon Bedrock

See CHANGELOG for latest features and fixes.

You’ve likely experienced the challenge of taking notes during a meeting while trying to pay attention to the conversation. You’ve probably also experienced the need to quickly fact-check something that’s been said, or look up information to answer a question that’s just been asked in the call. Or maybe you have a team member that always joins meetings late, and expects you to send them a quick summary over chat to catch them up.

Then there are the times that others are talking in a language that’s not your first language, and you’d love to have a live translation of what people are saying to make sure you understand correctly.

And after the call is over, you usually want to capture a summary for your records, or to send to the participants, with a list of all the action items, owners, and due dates.

All of this, and more, is now possible with our newest sample solution, Live Meeting Assistant (LMA).

Check out the following demo to see how it works.

In this post, we show you how to use LMA with Amazon Transcribe, Amazon Bedrock, and Knowledge Bases for Amazon Bedrock.

Solution overview

The LMA sample solution captures speaker audio and metadata from your browser-based meeting app (as of this writing, Zoom and Chime are supported), or audio only from any other browser-based meeting app, softphone, or audio source. It uses Amazon Transcribe for speech to text, Knowledge Bases for Amazon Bedrock for contextual queries against your company’s documents and knowledge sources, and Amazon Bedrock models for customizable transcription insights and summaries.

Everything you need is provided as open source in our GitHub repo. It’s straightforward to deploy in your AWS account. When you’re done, you’ll wonder how you ever managed without it!

The following are some of the things LMA can do:

Live transcription with speaker attribution – LMA is powered by Amazon Transcribe ASR models for low-latency, high-accuracy speech to text. You can teach it brand names and domain-specific terminology if needed, using custom vocabulary and custom language model features in Amazon Transcribe.

Live translation – It uses Amazon Translate to optionally show each segment of the conversation translated into your language of choice, from a selection of 75 languages.

Context-aware meeting assistant – It uses Knowledge Bases for Amazon Bedrock to provide answers from your trusted sources, using the live transcript as context for fact-checking and follow-up questions. To activate the assistant, just say “Okay, Assistant,” choose the ASK ASSISTANT! button, or enter your own question in the UI.

On-demand summaries of the meeting – With the click of a button on the UI, you can generate a summary, which is useful when someone joins late and needs to get caught up. The summaries are generated from the transcript by Amazon Bedrock. LMA also provides options for identifying the current meeting topic, and for generating a list of action items with owners and due dates. You can also create your own custom prompts and corresponding options.

Automated summary and insights – When the meeting has ended, LMA automatically runs a set of large language model (LLM) prompts on Amazon Bedrock to summarize the meeting transcript and extract insights. You can customize these prompts as well.

Meeting recording – The audio is (optionally) stored for you, so you can replay important sections on the meeting later.

Inventory list of meetings – LMA keeps track of all your meetings in a searchable list.

Browser extension captures audio and meeting metadata from popular meeting apps – The browser extension captures meeting metadata—the meeting title and names of active speakers—and audio from you (your microphone) and others (from the meeting browser tab). As of this writing, LMA supports Chrome for the browser extension, and Zoom and Chime for meeting apps (with Teams and WebEx coming soon). Standalone meeting apps don’t work with LMA —instead, launch your meetings in the browser.

You are responsible for complying with legal, corporate, and ethical restrictions that apply to recording meetings and calls. Do not use this solution to stream, record, or transcribe calls if otherwise prohibited.

Prerequisites

You need to have an AWS account and an AWS Identity and Access Management (IAM) role and user with permissions to create and manage the necessary resources and components for this application. If you don’t have an AWS account, see How do I create and activate a new Amazon Web Services account?

You also need an existing knowledge base in Amazon Bedrock. If you haven’t set one up yet, see Create a knowledge base. Populate your knowledge base with content to power LMA’s context-aware meeting assistant.

Finally, LMA uses Amazon Bedrock LLMs for its meeting summarization features. Before proceeding, if you have not previously done so, you must request access to the following Amazon Bedrock models:

Titan Embeddings G1 – Text
Anthropic: All Claude models

Deploy the solution using AWS CloudFormation

We’ve provided pre-built AWS CloudFormation templates that deploy everything you need in your AWS account.

If you’re a developer and you want to build, deploy, or publish the solution from code, refer to the Developer README.

Complete the following steps to launch the CloudFormation stack:

Log in to the AWS Management Console.
Choose Launch Stack for your desired AWS Region to open the AWS CloudFormation console and create a new stack.

Region
Launch Stack

US East (N. Virginia)

US West (Oregon)

For Stack name, use the default value, LMA.
For Admin Email Address, use a valid email address—your temporary password is emailed to this address during the deployment.
For Authorized Account Email Domain, use the domain name part of your corporate email address to allow users with email addresses in the same domain to create their own new UI accounts, or leave blank to prevent users from directly creating their own accounts. You can enter multiple domains as a comma-separated list.
For MeetingAssistService, choose BEDROCK_KNOWLEDGE_BASE (the only available option as of this writing).
For Meeting Assist Bedrock Knowledge Base Id (existing), enter your existing knowledge base ID (for example, JSXXXXX3D8). You can copy it from the Amazon Bedrock console.

For all other parameters, use the default values.

If you want to customize the settings later, for example to add your own AWS Lambda functions, use custom vocabularies and language models to improve accuracy, enable personally identifiable information (PII) redaction, and more, you can update the stack for these parameters.

Select the acknowledgement check boxes, then choose Create stack.

The main CloudFormation stack uses nested stacks to create the following resources in your AWS account:

Amazon Simple Storage Service (Amazon S3) buckets to hold build artifacts and call recordings
An AWS Fargate task with an Application Load Balancer providing a WebSocket server running code to consume stereo audio streams and relay to Amazon Transcribe, publish transcription segments in Amazon Kinesis Data Streams, and create and store stereo call recordings
A Kinesis data stream to relay call events and transcription segments to the enrichment processing function
LMA resources, including the QnABot on AWS solution stack, which interacts with Amazon OpenSearch Service and Amazon Bedrock
The AWS AppSync API, which provides a GraphQL endpoint to support queries and real-time updates
Website components, including an S3 bucket, Amazon CloudFront distribution, and Amazon Cognito user pool
A downloadable preconfigured browser extension application for Chrome browsers
Other supporting resources, including IAM roles and policies (using least privilege best practices), Amazon Virtual Private Cloud (Amazon VPC) resources, Amazon EventBridge event rules, and Amazon CloudWatch log groups.

The stacks take about 35–40 minutes to deploy. The main stack status shows CREATE_COMPLETE when everything is deployed.

Set your password

After you deploy the stack, open the LMA web user interface and set your password by completing the following steps:

Open the email you received, at the email address you provided, with the subject “Welcome to Live Meeting Assistant!”
Open your web browser to the URL shown in the email. You’re directed to the login page.

The email contains a generated temporary password that you use to log in and create your own password. Your user name is your email address.
Set a new password.

Your new password must have a length of at least eight characters, and contain uppercase and lowercase characters, plus numbers and special characters.

Follow the directions to verify your email address, or choose Skip to do it later.

You’re now logged in to LMA.

You also received a similar email with the subject “QnABot Signup Verification Code.” This email contains a generated temporary password that you use to log in and create your own password in the QnABot designer. You use QnABot designer only if you want to customize LMA options and prompts. Your username for QnABot is Admin. You can set your permanent QnABot Admin password now, or keep this email safe in case you want to customize things later.

Download and install the Chrome browser extension

For the best meeting streaming experience, install the LMA browser plugin (currently available for Chrome):

Choose Download Chrome Extension to download the browser extension .zip file (lma-chrome-extension.zip).

Choose (right-click) and expand the .zip file (lma-chrome-extension.zip) to create a local folder named lma-chrome-extension.
Open Chrome and enter the link chrome://extensions into the address bar.

Enable Developer mode.
Choose Load unpacked, navigate to the lma-chrome-extension folder (which you unzipped from the download), and choose Select. This loads your extension.
Pin the new LMA extension to the browser tool bar for easy access—you will use it often to stream your meetings!

Start using LMA

LMA provides two streaming options:

Chrome browser extension – Use this to stream audio and speaker metadata from your meeting browser app. It currently works with Zoom and Chime, but we hope to add more meeting apps.
LMA Stream Audio tab – Use this to stream audio from your microphone and any Chrome browser-based meeting app, softphone, or audio application.

We show you how to use both options in the following sections.

Use the Chrome browser extension to stream a Zoom call

Complete the following steps to use the browser extension:

Open the LMA extension and log in with your LMA credentials.

Join or start a Zoom meeting in your web browser (do not use the separate Zoom client).

If you already have the Zoom meeting page loaded, reload it.

The LMA extension automatically detects that Zoom is running in the browser tab, and populates your name and the meeting name.

Tell others on the call that you are about to start recording the call using LMA and obtain their permission. Do not proceed if participants object.
Choose Start Listening.

Read and accept the disclaimer, and choose Allow to share the browser tab.

The LMA extension automatically detects and displays the active speaker on the call. If you are alone in the meeting, invite some friends to join, and observe that the names they used to join the call are displayed in the extension when they speak, and are attributed to their words in the LMA transcript.

Choose Open in LMA to see your live transcript in a new tab.
Choose your preferred transcript language, and interact with the meeting assistant using the wake phrase “OK Assistant!” or the Meeting Assist Bot pane.

The ASK ASSISTANT button asks the meeting assistant service (Amazon Bedrock knowledge base) to suggest a good response based on the transcript of the recent interactions in the meeting. Your mileage may vary, so experiment!

When you are done, choose Stop Streaming to end the meeting in LMA.

Within a few seconds, the automated end-of-meeting summaries appear, and the audio recording becomes available. You can continue to use the bot after the call has ended.

Use the LMA UI Stream Audio tab to stream from your microphone and any browser-based audio application

The browser extension is the most convenient way to stream metadata and audio from supported meeting web apps. However, you can also use LMA to stream just the audio from any browser-based softphone, meeting app, or other audio source playing in your Chrome browser, using the convenient Stream Audio tab that is built into the LMA UI.

Open any audio source in a browser tab.

For example, this could be a softphone (such as Google Voice), another meeting app, or for demo purposes, you can simply play a local audio recording or a YouTube video in your browser to emulate another meeting participant. If you just want to try it, open the following YouTube video in a new tab.

In the LMA App UI, choose Stream Audio (no extension) to open the Stream Audio tab.
For Meeting ID, enter a meeting ID.
For Name, enter a name for yourself (applied to audio from your microphone).
For Participant Name(s), enter the names of the participants (applied to the incoming audio source).
Choose Start Streaming.

Choose the browser tab you opened earlier, and choose Allow to share.
Choose the LMA UI tab again to view your new meeting ID listed, showing the meeting as In Progress.

Choose the meeting ID to open the details page, and watch the transcript of the incoming audio, attributed to the participant names that you entered. If you speak, you’ll see the transcription of your own voice.

Use the Stream Audio feature to stream from any softphone app, meeting app, or any other streaming audio playing in the browser, along with your own audio captured from your selected microphone. Always obtain permission from others before recording them using LMA, or any other recording application.

Processing flow overview

How did LMA transcribe and analyze your meeting? Let’s look at how it works. The following diagram shows the main architectural components and how they fit together at a high level.

The LMA user joins a meeting in their browser, enables the LMA browser extension, and authenticates using their LMA credentials. If the meeting app (for example, Zoom.us) is supported by the LMA extension, the user’s name, meeting name, and active speaker names are automatically detected by the extension. If the meeting app is not supported by the extension, then the LMA user can manually enter their name and the meeting topic—active speakers’ names will not be detected.

After getting permission from other participants, the LMA user chooses Start Listening on the LMA extension pane. A secure WebSocket connection is established to the preconfigured LMA stack WebSocket URL, and the user’s authentication token is validated. The LMA browser extension sends a START message to the WebSocket containing the meeting metadata (name, topic, and so on), and starts streaming two-channel audio from the user’s microphone and the incoming audio channel containing the voices of the other meeting participants. The extension monitors the meeting app to detect active speaker changes during the call, and sends that metadata to the WebSocket, enabling LMA to label speech segments with the speaker’s name.

The WebSocket server running in Fargate consumes the real-time two-channel audio fragments from the incoming WebSocket stream. The audio is streamed to Amazon Transcribe, and the transcription results are written in real time to Kinesis Data Streams.

Each meeting processing session runs until the user chooses Stop Listening in the LMA extension pane, or ends the meeting and closes the tab. At the end of the call, the function creates a stereo recording file in Amazon S3 (if recording was enabled when the stack was deployed).

A Lambda function called the Call Event Processor, fed by Kinesis Data Streams, processes and optionally enriches meeting metadata and transcription segments. The Call Event Processor integrates with the meeting assist services. LMA is powered by Amazon Lex, Knowledge Bases for Amazon Bedrock, and Amazon Bedrock LLMs using the open source QnABot on AWS solution for answers based on FAQs and as an orchestrator for request routing to the appropriate AI service. The Call Event Processor also invokes the Transcript Summarization Lambda function when the call ends, to generate a summary of the call from the full transcript.

The Call Event Processor function interfaces with AWS AppSync to persist changes (mutations) in Amazon DynamoDB and send real-time updates to the LMA user’s logged-in web clients (conveniently opened by choosing the Open in LMA option in the browser extension).

The LMA web UI assets are hosted on Amazon S3 and served via CloudFront. Authentication is provided by Amazon Cognito.

When the user is authenticated, the web application establishes a secure GraphQL connection to the AWS AppSync API, and subscribes to receive real-time events such as new calls and call status changes for the meetings list page, and new or updated transcription segments and computed analytics for the meeting details page. When translation is enabled, the web application also interacts securely with Amazon Translate to translate the meeting transcription into the selected language.

The entire processing flow, from ingested speech to live webpage updates, is event driven, and the end-to-end latency is short—typically just a few seconds.

Monitoring and troubleshooting

AWS CloudFormation reports deployment failures and causes on the relevant stack’s Events tab. See Troubleshooting CloudFormation for help with common deployment problems. Look out for deployment failures caused by limit exceeded errors; the LMA stacks create resources that are subject to default account and Region service quotas, such as elastic IP addresses and NAT gateways. When troubleshooting CloudFormation stack failures, always navigate into any failed nested stacks to find the first nested resource failure reported—this is almost always the root cause.

Amazon Transcribe has a default limit of 25 concurrent transcription streams, which limits LMA to 25 concurrent meetings in a given AWS account or Region. Request an increase for the number of concurrent HTTP/2 streams for streaming transcription if you have many users and need to handle a larger number of concurrent meetings in your account.

LMA provides runtime monitoring and logs for each component using CloudWatch:

WebSocket processing and transcribing Fargate task – On the Amazon Elastic Container Service (Amazon ECS) console, navigate to the Clusters page and open the LMA-WEBSOCKETSTACK-xxxx-TranscribingCluster function. Choose the Tasks tab and open the task page. Choose Logs and View in CloudWatch to inspect the WebSocket transcriber task logs.
Call Event Processor Lambda function – On the Lambda console, open the LMA-AISTACK-CallEventProcessor function. Choose the Monitor tab to see function metrics. Choose View logs in CloudWatch to inspect function logs.
AWS AppSync API – On the AWS AppSync console, open the CallAnalytics-LMA API. Choose Monitoring in the navigation pane to see API metrics. Choose View logs in CloudWatch to inspect AWS AppSync API logs.

For QnABot on AWS for Meeting Assist, refer to the Meeting Assist README, and the QnABot solution implementation guide for additional information.

Cost assessment

LMA provides a WebSocket server using Fargate (2vCPU) and VPC networking resources costing about $0.10/hour (approximately $72/month). For more details, see AWS Fargate Pricing.

LMA is enabled using QnABot and Knowledge Bases for Amazon Bedrock. You create your own knowledge base, which you use for LMA and potentially other use cases. For more details, see Amazon Bedrock Pricing. Additional AWS services used by the QnABot solution cost about $0.77/hour. For more details, refer to the list of QnABot on AWS solution costs.

The remaining solution costs are based on usage.

The usage costs add up to about $0.17 for a 5-minute call, although this can vary based on options selected (such as translation), number of LLM summarizations, and total usage because usage affects Free Tier eligibility and volume tiered pricing for many services. For more information about the services that incur usage costs, see the following:

AWS AppSync pricing
Amazon Bedrock pricing
Amazon Cognito Pricing
Amazon DynamoDB pricing
AWS Lambda Pricing
Amazon S3 pricing
Amazon Transcribe Pricing
Amazon Translate pricing
QnABot on AWS costs

To explore LMA costs for yourself, use AWS Cost Explorer or choose Bill Details on the AWS Billing Dashboard to see your month-to-date spend by service.

Customize your deployment

Use the following CloudFormation template parameters when creating or updating your stack to customize your LCA deployment:

To use your own S3 bucket for meeting recordings, use Call Audio Recordings Bucket Name and Audio File Prefix.
To redact PII from the transcriptions, set Enable Content Redaction for Transcripts to true, and adjust Transcription PII Redaction Entity Types as needed. For more information, see Redacting or identifying PII in a real-time stream.
To improve transcription accuracy for technical and domain-specific acronyms and jargon, set Transcription Custom Vocabulary Name to the name of a custom vocabulary that you already created in Amazon Transcribe or set Transcription Custom Language Model Name to the name of a previously created custom language model. For more information, see Improving Transcription Accuracy.
To transcribe meetings in a supported language other than US English, choose the desired value for Language for Transcription.
To customize transcript processing, optionally set Lambda Hook Function ARN for Custom Transcript Segment Processing to the ARN of your own Lambda function. For more information, see Using a Lambda function to optionally provide custom logic for transcript processing.
To customize the meeting assist capabilities based on the QnABot on AWS solution, Amazon Lex, Amazon Bedrock, and Knowledge Bases for Amazon Bedrock integration, see the Meeting Assist README.
To customize transcript summarization by configuring LMA to call your own Lambda function, see Transcript Summarization LAMBDA option.
To customize transcript summarization by modifying the default prompts or adding new ones, see Transcript Summarization.
To change the retention period, set Record Expiration In Days to the desired value. All call data is permanently deleted from the LMA DynamoDB storage after this period. Changes to this setting apply only to new calls received after the update.

LMA is an open source project. You can fork the LMA GitHub repository, enhance the code, and send us pull requests so we can incorporate and share your improvements!

Update an existing LMA stack

You can update your existing LMA stack to the latest release. For more details, see Update an existing stack.

Clean up

Congratulations! You have completed all the steps for setting up your live call analytics sample solution using AWS services.

When you’re finished experimenting with this sample solution, clean up your resources by using the AWS CloudFormation console to delete the LMA stacks that you deployed. This deletes resources that were created by deploying the solution. The recording S3 buckets, DynamoDB table, and CloudWatch log groups are retained after the stack is deleted to avoid deleting your data.

Live Call Analytics: Companion solution

Our companion solution, Live Call Analytics and Agent Assist (LCA), offers real-time transcription and analytics for contact centers (phone calls) rather than meetings. There are many similarities—in fact, LMA was built using an architecture and many components derived from LCA.

Conclusion

The Live Meeting Assistant sample solution offers a flexible, feature-rich, and customizable approach to provide live meeting assistance to improve your productivity during and after meetings. It uses Amazon AI/ML services like Amazon Transcribe, Amazon Lex, Knowledge Bases for Amazon Bedrock, and Amazon Bedrock LLMs to transcribe and extract real-time insights from your meeting audio.

The sample LMA application is provided as open source—use it as a starting point for your own solution, and help us make it better by contributing back fixes and features via GitHub pull requests. Browse to the LMA GitHub repository to explore the code, choose Watch to be notified of new releases, and check the README for the latest documentation updates.

For expert assistance, AWS Professional Services and other AWS Partners are here to help.

We’d love to hear from you. Let us know what you think in the comments section, or use the issues forum in the LMA GitHub repository.

About the authors

Bob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.

Chris Lott is a Principal Solutions Architect in the AWS AI Language Services team. He has 20 years of enterprise software development experience. Chris lives in Sacramento, California and enjoys gardening, aerospace, and traveling the world.

Babu Srinivasan is a Sr. Specialist SA – Language AI services in the World Wide Specialist organization at AWS, with over 24 years of experience in IT and the last 6 years focused on the AWS Cloud. He is passionate about AI/ML. Outside of work, he enjoys woodworking and entertains friends and family (sometimes strangers) with sleight of hand card magic.

Kishore Dhamodaran is a Senior Solutions Architect at AWS.

Gillian Armstrong is a Builder Solutions Architect. She is excited about how the Cloud is opening up opportunities for more people to use technology to solve problems, and especially excited about how cognitive technologies, like conversational AI, are allowing us to interact with computers in more human ways.

Meta Llama 3 models are now available in Amazon SageMaker JumpStart

Today, we are excited to announce that Meta Llama 3 foundation models are available through Amazon SageMaker JumpStart to deploy and run inference. The Llama 3 models are a collection of pre-trained and fine-tuned generative text models.

In this post, we walk through how to discover and deploy Llama 3 models via SageMaker JumpStart.

What is Meta Llama 3

Llama 3 comes in two parameter sizes — 8B and 70B with 8k context length — that can support a broad range of use cases with improvements in reasoning, code generation, and instruction following. Llama 3 uses a decoder-only transformer architecture and new tokenizer that provides improved model performance with 128k size. In addition, Meta improved post-training procedures that substantially reduced false refusal rates, improved alignment, and increased diversity in model responses. You can now derive the combined advantages of Llama 3 performance and MLOps controls with Amazon SageMaker features such as SageMaker Pipelines, SageMaker Debugger, or container logs. In addition, the model will be deployed in an AWS secure environment under your VPC controls, helping provide data security.

What is SageMaker JumpStart

With SageMaker JumpStart, you can choose from a broad selection of publicly available foundation models. ML practitioners can deploy foundation models to dedicated SageMaker instances from a network isolated environment and customize models using SageMaker for model training and deployment. You can now discover and deploy Llama 3 models with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as SageMaker Pipelines, SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping provide data security. Llama 3 models are available today for deployment and inferencing in Amazon SageMaker Studio in us-east-1 (N. Virginia), us-east-2 (Ohio), us-west-2 (Oregon), eu-west-1 (Ireland) and ap-northeast-1 (Tokyo) AWS Regions.

Discover models

You can access the foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.

In SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and prebuilt solutions, under Prebuilt and automated solutions.

From the SageMaker JumpStart landing page, you can easily discover various models by browsing through different hubs which are named after model providers. You can find Llama 3 models in Meta hub. If you do not see Llama 3 models, please update your SageMaker Studio version by shutting down and restarting. For more information, refer to Shut down and Update Studio Classic Apps.

You can find Llama 3 models by searching for “Meta-llama-3“ from the search box located at top left.

You can discover all Meta models available in SageMaker JumpStart by clicking on Meta hub.

Clicking on a model card opens the corresponding model detail page, from which you can easily Deploy the model.

Deploy a model

When you choose Deploy and acknowledge the EULA terms, deployment will start.

You can monitor progress of the deployment on the page that shows up after clicking the Deploy button.

Alternatively, you can choose Open notebook to deploy through the example notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.

To deploy using the notebook, you start by selecting an appropriate model, specified by the model_id. You can deploy any of the selected models on SageMaker with the following code.

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id = “meta-textgeneration-llama-3-70b-instruct”)
predictor = model.deploy(accept_eula=False)

By default accept_eula is set to False. You need to manually accept the EULA to deploy the endpoint successfully, By doing so, you accept the user license agreement and acceptable use policy. You can also find the license agreement Llama website. This deploys the model on SageMaker with default configurations including the default instance type and default VPC configurations. You can change these configuration by specifying non-default values in JumpStartModel. To learn more, please refer to the following documentation.

The following table lists all the Llama 3 models available in SageMaker JumpStart along with the model_ids, default instance types and maximum number of total tokens (sum of the number of input tokens and number of generated tokens) supported for each of these models.

Model Name
Model ID
Max Total Tokens
Default instance type

Meta-Llama-3-8B
meta-textgeneration-llama-3-8B
8192
ml.g5.12xlarge

Meta-Llama-3-8B-Instruct
meta-textgeneration-llama-3-8B-instruct
8192
ml.g5.12xlarge

Meta-Llama-3-70B
meta-textgeneration-llama-3-70b
8192
ml.p4d.24xlarge

Meta-Llama-3-70B-Instruct
meta-textgeneration-llama-3-70b-instruct
8192
ml.p4d.24xlarge

Run inference

After you deploy the model, you can run inference against the deployed endpoint through SageMaker predictor. Fine-tuned instruct models (Llama 3: 8B Instruct and 70B Instruct) accept a history of chats between the user and the chat assistant, and generate the subsequent chat. The pre-trained models (Llama 3: 8B and 70B) require a string prompt and perform text completion on the provided prompt.

predictor.predict(payload)

Inference parameters control the text generation process at the endpoint. The Max new tokens control the size of the output generated by the model. This is not same as the number of words because the vocabulary of the model is not the same as the English language vocabulary, and each token may not be an English language word. The temperature parameter controls the randomness in the output. Higher temperature results in more creative and hallucinated outputs. All the inference parameters are optional.

Example prompts for the 70B model

You can use Llama 3 models for text completion for any piece of text. Through text generation, you can perform a variety of tasks such as question answering, language translation, and sentiment analysis, and more. The input payload to the endpoint looks like the following code:

payload = {
“inputs”: str,
(optional)”parameters”:{“max_new_tokens”:int, “top_p”:float, “temperature”:float}
}

The following are some sample example prompts and the text generated by the model. All outputs are generated with inference parameters {“max_new_tokens”:64, “top_p”:0.9, “temperature”:0.6}.

Input: Simply put, the theory of relativity states that
Output: 1) the laws of physics are the same for all observers, and 2) the speed of light is constant for all observers. The first part of the theory is known as the special theory of relativity, and the second part is known as the general theory of relativity.nThe special theory of relativity states

Input: I believe the meaning of life is
Output: to live a life of purpose, on purpose. I believe that we are all here to make a difference in the world, and that we are all here to make a difference in the lives of others. I believe that we are all here to make a difference in the world, and that we are all here to mak

Input: A brief message congratulating the team on the launch:nnHi everyone,nnI just
Output: wanted to say congratulations on the launch of the new website! It looks great and I’m sure it will be a valuable resource for our customers. I know that a lot of hard work went into making this happen, so thank you all for your efforts.nnBest regards,nJohn Doen

In the next example, we show how to use Llama 3 models with few shot in-context learning where we provide training samples available to the model. We only run inference on the deployed model and during this process, and model weights do not change.

Input:
Translate English to French:nsea otter => loutre de mernpeppermint => menthe poivréenplush girafe => girafe peluchencheese =>
Output: fromagenpanda => pandanpenguin => pingouinnpolar bear => ours polairenpuffin => macareuxnsloth => paresseuxnwalrus => morsenwhale => baleinenzebra => zu00e8brenTranslate English to German:n

Example prompts for the 70B-Instruct model

With Llama 3 instruct models which are optimized for dialogue use cases, the input to the instruct model endpoints is the previous history between the chat assistant and the user. You can ask questions contextual to the conversation that has happened so far. You can also provide the system configuration, such as personas, which define the chat assistant’s behavior. While the input payload format is the same as the base pre-trained model, the input text should be formatted in the following manner:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

[SYSTEM_TEXT]<|eot_id|><|start_header_id|>user<|end_header_id|>

[USER_TEXT]<|eot_id|><|start_header_id|>assistant<|end_header_id|>

In this instruction template, you can optionally start with a system role and include as many alternating roles as desired in the turn-based history. The final role should always be assistant and end with two new line feeds.

Next, consider a few example prompts and responses from the model. In the following example, the user is asking a simple question to the assistant.

Input: <|begin_of_text|><|start_header_id|>user<|end_header_id|>nnwhat is the recipe of mayonnaise?<|eot_id|><|start_header_id|>assistant<|end_header_id|>nn
Output: The classic condiment! Mayonnaise is a thick, creamy emulsion sauce made from a combination of oil, egg yolks, acid (such as vinegar or lemon juice), and seasonings. Here’s a simple recipe to make mayonnaise at home:nn**Ingredients:**nn* 2 egg yolksn* 1 tablespoon lemon juice or vinegar (such as apple cider vinegar or white wine vinegar)n* 1/2 teaspoon Dijon mustard (optional, but recommended for flavor)n* 1/2 cup (120 ml) neutral-tasting oil, such as canola, grapeseed, or sunflower oiln* Salt, to tastenn**Instructions:**nn1. **Start with room temperature ingredients**: This is crucial for emulsification to occur.n2. **In a medium-sized bowl**, whisk together the egg yolks, lemon juice or vinegar, and Dijon mustard (if using) until well combined.n3. **Slowly add the oil**: While continuously whisking the egg yolk mixture, slowly pour in the oil in a thin, steady stream. Start with a very slow drizzle and gradually increase the flow as the mixture thickens.n4. **Whisk constantly**: Keep whisking until the mixture has doubled in volume and has a thick, creamy consistency. This should take about 5-7 minutes.n5. **Check the consistency**: Stop whisking when the mayonnaise has reached the desired consistency. It should be smooth, thick, and hold its shape when the whisk is lifted.n6. **Season with salt**: Add salt to taste, and whisk until well combined.n7. **Taste and adjust**: If the mayonnaise is too thick, add a little water. If it’s not thick enough, add a little more oil.nn**Tips and Variations:**nn* Use a neutral-tasting oil to avoid flavoring the mayonnaise.n* If you’re using a stand mixer, use the whisk attachment and start with a slow speed, increasing as needed.n* Add flavorings like minced garlic, chopped fresh herbs (e.g., parsley, dill, or tarragon), or grated ginger to create different flavor profiles.n* For a lighter mayonnaise, use only 1 egg yolk or add a little water to thin it out.n* If the mayonnaise breaks or separates, don’t worry! Simply start again with a new egg yolk and slowly whisk in the broken mayonnaise.nn**Homemade Mayonnaise vs. Store-Bought:**nnWhile store-bought mayonnaise is convenient, homemade mayonnaise offers several advantages:nn* Freshness: Homemade mayonnaise is made to order, ensuring optimal flavor and texture.n* Customization: You can tailor the flavor to your liking with various seasonings and ingredients.n* No preservatives: Homemade mayonnaise doesn’t contain added preservatives, making it a healthier option.nnEnjoy your freshly made mayonnaise!

In the following example, the user has a conversation with the assistant about tourist sites in Paris. Then the user inquires  about the first option recommended by the chat assistant.

Input: <|begin_of_text|><|start_header_id|>user<|end_header_id|>nnI am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>nnParis, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:nn1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city.n2. The Louvre Museum: The Louvre is one of the world’s largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa.n3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.nnThese are just a few of the many attractions that Paris has to offer. With so much to see and do, it’s no wonder that Paris is one of the most popular tourist destinations in the world.<|eot_id|><|start_header_id|>user<|end_header_id|>nnWhat is so great about #1?<|eot_id|><|start_header_id|>assistant<|end_header_id|>nn
Output: The Eiffel Tower! It’s an iconic symbol of Paris and one of the most recognizable landmarks in the world. Here are some reasons why it’s so great:nn1. **Engineering marvel**: When it was built for the 1889 World’s Fair, the Eiffel Tower was a groundbreaking feat of engineering. It was the tallest structure in the world at the time, standing at 324 meters (1,063 feet).n2. **Panoramic views**: The Eiffel Tower offers breathtaking 360-degree views of the city from its observation decks. You can see many of Paris’ famous landmarks, such as the Seine River, the Louvre, and Montmartre.n3. **Romantic atmosphere**: The Eiffel Tower is often associated with romance and is a popular spot for proposals, weddings, and romantic dinners. The tower’s iron latticework and twinkling lights at night add to its romantic ambiance.n4. **Historical significance**: The Eiffel Tower has played a significant role in French history, serving as a communication tower during World War I and a symbol of French culture and engineering.n5. **Accessibility**: The Eiffel Tower is easily accessible by public transportation or on foot, and there are various ways to experience it, including taking the stairs, elevator, or a guided tour.n6. **Nighttime spectacle**: The Eiffel Tower is even more magical at night, when it’s illuminated with thousands of twinkling lights. The light show, known as the “Eiffel Tower Sparkle,” takes place every hour on the hour from dusk to midnight.n7. **Iconic status**: The Eiffel Tower has been featured in countless films, literature, and art, making it an instantly recognizable symbol of Paris and France.nnOverall, the Eiffel Tower is a must-visit attraction in Paris, offering a unique combination of history, engineering, romance, and stunning views.

In the following examples, we set the system’s configuration.

Input: <|begin_of_text|><|start_header_id|>system<|end_header_id|>nnAlways answer with Haiku<|eot_id|><|start_header_id|>user<|end_header_id|>nnI am going to Paris, what should I see?<|eot_id|><|start_header_id|>assistant<|end_header_id|>nn
Output: Eiffel’s iron lacenRiver Seine’s gentle flow bynMontmartre’s charm calls
Input: <|begin_of_text|><|start_header_id|>system<|end_header_id|>nnAlways answer with emojis<|eot_id|><|start_header_id|>user<|end_header_id|>nnHow to go from Beijing to NY?<|eot_id|><|start_header_id|>assistant<|end_header_id|>nn
Output: ✈🛬🇺🇸

Clean up

After you’re done running the notebook, make sure to delete all the resources that you created in the process so your billing is stopped. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Llama 3 models in SageMaker Studio. You now have access to four Llama 3 foundation models that contain billions of parameters. Because foundation models are pretrained, they can also help lower training and infrastructure costs and enable customization for your use case. Check out SageMaker JumpStart in SageMaker Studio now to get started.

SageMaker JumpStart documentation
SageMaker JumpStart Foundation Models documentation
SageMaker JumpStart product detail page
SageMaker JumpStart model catalog

About Authors

Kyle Ulrich is an Applied Scientist II at AWS
Xin Huang is a Senior Applied Scientist at AWS
Qing Lan is a Senior Software Developer Engineer at AWS
Haotian An is a Software Developer Engineer II at AWS
Christopher Whitten is a Software Development Engineer II at AWS
Tyler Osterberg is a Software Development Engineer I at AWS
Manan Shah is a Software Development Manager at AWS
Jonathan Guinegagne is a Senior Software Developer Engineer at AWS
Adriana Simmons is a Senior Product Marketing Manager at AWS
June Won is a Senior Product Manager at AWS
Ashish Khetan is a Senior Applied Scientist at AWS
Rachna Chadha is a Principal Solution Architect at AWS
Deepak Rupakula is a Principal GTM Specialist at AWS

SAMMO: A general-purpose framework for prompt optimization

Large language models (LLMs) have revolutionized a wide range of tasks and applications that were previously reliant on manually crafted machine learning (ML) solutions, streamlining through automation. However, despite these advances, a notable challenge persists: the need for extensive prompt engineering to adapt these models to new tasks. New generations of language models like GPT-4 and Mixtral 8x7B advance the capability to process long input texts. This progress enables the use of longer inputs, providing richer context and detailed instructions to language models. A common technique that uses this enhanced capacity is the Retrieval Augmented Generation (RAG) approach. RAG dynamically incorporates information into the prompt based on the specific input example. This process is illustrated in Figure 1, which shows a RAG prompt designed to translate user queries into a domain-specific language (DSL), also known as semantic parsing. 

Figure 1: A RAG prompt is used for a semantic parsing task. The underlying prompt consists of three larger parts, each with a variety of aspects that can be optimized.

The example in Figure 1 combines three distinct structures to construct the final prompt. The first structure, the task description, remains static and independent of the input as a result of conventional prompt optimization techniques. However, RAG contains two input-specific structures: the example retriever and the input text itself. These introduce numerous optimization opportunities that surpass the scope of most traditional approaches. Despite previous efforts in prompt optimization, the evolution towards more complex prompt structures has rendered many older strategies ineffective in this new context. 

SAMMO: A prompt optimization approach 

Download

SAMMO 

To address these challenges, we developed the Structure-Aware Multi-objective Metaprompt Optimization (SAMMO) framework. SAMMO is a new open-source tool that streamlines the optimization of prompts, particularly those that combine different types of structural information like in the RAG example above. It can make structural changes, such as removing entire components or replacing them with different ones. These features enable AI practitioners and researchers to efficiently refine their prompts with little manual effort.

Central to SAMMO’s innovation is its approach to treating prompts not just as static text inputs but as dynamic, programmable entities—metaprompts. SAMMO represents these metaprompts as function graphs, where individual components and substructures can be modified to optimize performance, similar to the optimization process that occurs during traditional program compilation.

The following key features contribute to SAMMO’s effectiveness:

Structured optimization: Unlike current methods that focus on text-level changes, SAMMO focuses on optimizing the structure of metaprompts. This granular approach facilitates precise modifications and enables the straightforward integration of domain knowledge, for instance, through rewrite operations targeting specific stylistic objectives. 
 
Multi-objective search: SAMMO’s flexibility enables it to simultaneously address multiple objectives, such as improving accuracy and computational efficiency. Our paper illustrates how SAMMO can be used to compress prompts without compromising their accuracy.

General purpose application: SAMMO has proven to deliver significant performance improvements across a variety of tasks, including instruction tuning, RAG, and prompt compression.

Microsoft Research Podcast

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

Dr. Bichlien Nguyen and Dr. David Kwabi explore their work in flow batteries and how machine learning can help more effectively search the vast organic chemistry space to identify compounds with properties just right for storing waterpower and other renewables.


Opens in a new tab

Exploring SAMMO’s impact through use cases 

Use case 1: RAG optimization 

A common application of LLMs involves translating natural user queries into domain-specific language (DSL) constructions, often to communicate with external APIs. For example, Figure 1 shows how an LLM can be used to map user queries about geography facts to a custom DSL.

In a realistic RAG scenario, SAMMO demonstrates significant performance improvements. To demonstrate this, we conducted experiments across three semantic parsing datasets of varying complexity: GeoQuery, SMCalFlow, and Overnight. Given the often limited availability of data in practical settings, we trained and tested the model on a subsampled dataset (training and retrieval set n=600, test set n=100). We compared SAMMO against a manually designed competitive baseline, using enumerative search within a search space of 24 configurations. This included variations in data formats, the number of few-shot examples, and DSL specifications.  

Evaluation 

As illustrated in Figure 2, SAMMO improved accuracy across different datasets and backend LLMs in almost all cases, with the most notable gains observed in older generation models. However, even newer models like GPT-4, SAMMO facilitated accuracy improvements exceeding 100 percent.

Figure 2: For semantic parsing with RAG, SAMMO achieves substantial improvements across most backend models and datasets. 

Use case 2: Instruction tuning 

Instruction tuning addresses the optimization of static instructions given to LLMs that provide the goal and constraints of a task. To show that SAMMO extends beyond many previous prompt tuning methods, we applied this conventional setting.

To align with previous research, we used eight zero-shot BigBench classification tasks where the baseline prompt for GPT-3.5 achieved an accuracy of less than 0.9. We compared it against Automatic Prompt Optimization (APO) and GrIPS, applying open-source models Mixtral 7x8B and Llama-2 70B, alongside GPT-3.5 as backend LLMs. We did not include GPT-4 due to minimal improvement potential identified in pilot experiments. The results, shown in Figure 3, demonstrate that SAMMO outperformed all baselines regardless of the backend model, proving its effectiveness with even more complex metaprompts.

Figure 3: SAMMO does at least as well as older methods for instruction tuning on simpler tasks.

Implications and looking forward

SAMMO introduces a new and flexible approach to optimize prompts for specific requirements. Its design works with any LLM, and it features versatile components and operators suitable for a broad range of applications.

We are excited to integrate and apply SAMMO to the components and pipelines behind AI-powered assistant technologies. We also hope to establish a user-driven community centered around SAMMO, where people can exchange best practices and patterns, and encourage the expansion of the existing set of search operators.

Opens in a new tab

The post SAMMO: A general-purpose framework for prompt optimization appeared first on Microsoft Research.

Slack delivers native and secure generative AI powered by Amazon SageMaker JumpStart

This post is co-authored by Jackie Rocca, VP of Product, AI at Slack

Slack is where work happens. It’s the AI-powered platform for work that connects people, conversations, apps, and systems together in one place. With the newly launched Slack AI—a trusted, native, generative artificial intelligence (AI) experience available directly in Slack—users can surface and prioritize information so they can find their focus and do their most productive work.

We are excited to announce that Slack, a Salesforce company, has collaborated with Amazon SageMaker JumpStart to power Slack AI’s initial search and summarization features and provide safeguards for Slack to use large language models (LLMs) more securely. Slack worked with SageMaker JumpStart to host industry-leading third-party LLMs so that data is not shared with the infrastructure owned by third party model providers.

This keeps customer data in Slack at all times and upholds the same security practices and compliance standards that customers expect from Slack itself. Slack is also using Amazon SageMaker inference capabilities for advanced routing strategies to scale the solution to customers with optimal performance, latency, and throughput.

“With Amazon SageMaker JumpStart, Slack can access state-of-the-art foundation models to power Slack AI, while prioritizing security and privacy. Slack customers can now search smarter, summarize conversations instantly, and be at their most productive.”

– Jackie Rocca, VP Product, AI at Slack

Foundation models in SageMaker JumpStart

SageMaker JumpStart is a machine learning (ML) hub that can help accelerate your ML journey. With SageMaker JumpStart, you can evaluate, compare, and select foundation models (FMs) quickly based on predefined quality and responsibility metrics to perform tasks like article summarization and image generation. Pretrained models are fully customizable for your use case with your data, and you can effortlessly deploy them into production with the user interface or SDK. In addition, you can access prebuilt solutions to solve common use cases and share ML artifacts, including ML models and notebooks, within your organization to accelerate ML model building and deployment. None of your data is used to train the underlying models. All the data is encrypted and is never shared with third-party vendors so you can trust that your data remains private and confidential.

Check out the SageMaker JumpStart model page for available models.

Slack AI

Slack launched Slack AI to provide native generative AI capabilities so that customers can easily find and consume large volumes of information quickly, enabling them to get even more value out of their shared knowledge in Slack.  For example, users can ask a question in plain language and instantly get clear and concise answers with enhanced search. They can catch up on channels and threads in one click with conversation summaries. And they can access personalized, daily digests of what’s happening in select channels with the newly launched recaps.

Because trust is Slack’s most important value, Slack AI runs on an enterprise-grade infrastructure they built on AWS, upholding the same security practices and compliance standards that customers expect. Slack AI is built for security-conscious customers and is designed to be secure by design—customer data remains in-house, data is not used for LLM training purposes, and data remains siloed.

Solution overview

SageMaker JumpStart provides access to many LLMs, and Slack selects the right FMs that fit their use cases. Because these models are hosted on Slack’s owned AWS infrastructure, data sent to models during invocation doesn’t leave Slack’s AWS infrastructure. In addition, to provide a secure solution, data sent for invoking SageMaker models is encrypted in transit. The data sent to SageMaker JumpStart endpoints for invoking models is not used to train base models. SageMaker JumpStart allows Slack to support high standards for security and data privacy, while also using state-of-the-art models that help Slack AI perform optimally for Slack customers.

SageMaker JumpStart endpoints serving Slack business applications are powered by AWS instances. SageMaker supports a wide range of instance types for model deployment, which allows Slack to pick the instance that is best suited to support latency and scalability requirements of Slack AI use cases. Slack AI has access to multi-GPU based instances to host their SageMaker JumpStart models. Multiple GPU instances allow each instance backing Slack AI’s endpoint to host multiple copies of a model. This helps improve resource utilization and reduce model deployment cost. For more information, refer to Amazon SageMaker adds new inference capabilities to help reduce foundation model deployment costs and latency.

The following diagram illustrates the solution architecture.

To use the instances most effectively and support the concurrency and latency requirements, Slack used SageMaker-offered routing strategies with their SageMaker endpoints. By default, a SageMaker endpoint uniformly distributes incoming requests to ML instances using a round-robin algorithm routing strategy called RANDOM. However, with generative AI workloads, requests and responses can be extremely variable, and it’s desirable to load balance by considering the capacity and utilization of the instance rather than random load balancing. To effectively distribute requests across instances backing the endpoints, Slack uses the LEAST_OUTSTANDING_REQUESTS (LAR) routing strategy. This strategy routes requests to the specific instances that have more capacity to process requests instead of randomly picking any available instance. The LAR strategy provides more uniform load balancing and resource utilization. As a result, Slack AI noticed over a 39% latency decrease in their p95 latency numbers when enabling LEAST_OUTSTANDING_REQUESTS compared to RANDOM.

For more details on SageMaker routing strategies, see Minimize real-time inference latency by using Amazon SageMaker routing strategies.

Conclusion

Slack is delivering native generative AI capabilities that will help their customers be more productive and easily tap into the collective knowledge that’s embedded in their Slack conversations. With fast access to a large selection of FMs and advanced load balancing capabilities that are hosted in dedicated instances through SageMaker JumpStart, Slack AI is able to provide rich generative AI features in a more robust and quicker manner, while upholding Slack’s trust and security standards.

Learn more about SageMaker JumpStart, Slack AI and how the Slack team built Slack AI to be secure and private. Leave your thoughts and questions in the comments section.

About the Authors

Jackie Rocca is VP of Product at Slack, where she oversees the vision and execution of Slack AI, which brings generative AI natively and securely into Slack’s user experience. In her five years at Slack, Jackie has delivered on a number of initiatives to push Slack’s business forward. Now she’s on a mission to help customers accelerate their productivity and get even more value out of their conversations, data, and collective knowledge with generative AI. Prior to her time at Slack, Jackie was a Product Manager at Google for more than six years, where she helped launch and grow Youtube TV. Jackie is based in the San Francisco Bay Area.

Rachna Chadha is a Principal Solutions Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that the ethical and responsible use of AI can improve society in the future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.

Marc Karp is an ML Architect with the Amazon SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

Maninder (Mani) Kaur is the AI/ML Specialist lead for Strategic ISVs at AWS. With her customer-first approach, Mani helps strategic customers shape their AI/ML strategy, fuel innovation, and accelerate their AI/ML journey. Mani is a firm believer of ethical and responsible AI, and strives to ensure that her customers’ AI solutions align with these principles.

Gene Ting is a Principal Solutions Architect at AWS. He is focused on helping enterprise customers build and operate workloads securely on AWS. In his free time, Gene enjoys teaching kids technology and sports, as well as following the latest on cybersecurity.

Alan Tan is a Senior Product Manager with SageMaker, leading efforts on large model inference. He’s passionate about applying machine learning to the area of analytics. Outside of work, he enjoys the outdoors.

Report to Congress on Israel and Hamas Conflict, U.S. Policy Options

The following is the April 16, 2024, Congressional Research Service report, Israel and Hamas Conflict In Brief: Overview, U.S. Policy, and Options for Congress.

From the report

On October 7, 2023, the Palestinian Sunni Islamist group Hamas (a U.S.-designated foreign terrorist organization, or FTO) led surprise attacks against Israel from the Gaza Strip. More than 1,200 Israelis and foreign nationals (including at least 35 U.S. citizens in Israel) were killed. Hamas and other groups also seized 253 hostages on October 7. Iran reportedly provides material support to Hamas, and according to U.S. officials may be complicit in a broad sense, but President Joe Biden said in October “there is no evidence” that Iran helped plan the attack. The ongoing conflict could affect and be affected by Israel-Iran conflict elsewhere (see, for example, CRS Insight IN12347, Escalation of the Israel-Iran Conflict, coordinated by Jeremy M. Sharp).

In response to the October 7 attacks, Israel declared war on Hamas and launched aerial bombardment and ground operations in Gaza. As of April 15, 2024, more than 33,000 Palestinians in Gaza have been killed, according to the Hamas-controlled Gaza health ministry. Israel publicly seeks the elimination of Hamas’s military and governing capabilities and the release of all hostages; Hamas’s tunnels have made these tasks more challenging. The Israeli military has reportedly cleared some key areas in Gaza of visible Hamas control, withdrawn tens of thousands of troops, and apparently shifted toward lower-intensity warfare. Israel has said it will conduct a ground campaign in Gaza’s southernmost city of Rafah to corner Hamas’s intact fighting units. Israeli leaders are reportedly debating how to balance declared national objectives regarding Hamas and hostage recovery, and also international concerns about the welfare of the over one million civilians in Rafah—most of whom earlier fled other conflict areas.

The conflict’s impact in Gaza, with an estimated 1.7 million Gazans displaced (out of a population of approximately 2.1 million people) has generated a humanitarian crisis. Some food and other essential supplies have entered Gaza via Egypt and Israel during the conflict, but significant delays and other obstacles to transporting aid through crossings and Israeli checkpoints and then safely delivering it—particularly in northern Gaza—contributed to a March warning from the Integrated Food Security Phase Classification that “famine is imminent.” The United States and other countries began in March to provide additional humanitarian assistance to Gaza via airdrops and a planned maritime corridor. President Biden has ordered the U.S. military to construct a temporary port just offshore from Gaza to facilitate commercial shipments of assistance, with a goal to begin humanitarian assistance deliveries from the port by sometime in May.

The Biden Administration has provided broad political and material support for Israel’s defense and its stated goals to end Hamas’s rule in Gaza and recover hostages, but has sharpened its public criticism of Israeli actions. After an April 1 IDF drone strike killed seven relief workers in Gaza, President Biden stated that “strikes on humanitarian workers and the overall humanitarian situation are unacceptable.” He added that U.S. policy “will be determined by our assessment of Israel’s immediate action” to address “civilian harm, humanitarian suffering, and the safety of aid workers.” The Administration also has pressed Israel not to proceed with major military operations in Rafah without clearly planning for civilian safety and welfare. Negotiations involving the United States, Egypt, and Qatar have sought to resolve sticking points between Israel and Hamas on a potential multi-week cease-fire and hostage-prisoner exchange deal. In November, Israel and Hamas agreed to a multi-day pause in fighting featuring hostage-prisoner exchanges. Around 129 hostages reportedly remain in Gaza, including some Americans (with at least 34 hostages presumed dead; some reports indicate that number may be higher).

Officials from the United States, Israel, and the West Bank-based Palestinian Authority (PA) have differed publicly on some proposed aspects of post-conflict security and governance for Gaza. U.S. officials have expressed support for a resumption of PA rule in Gaza after the PA undertakes certain reforms, as part of efforts to move toward a two-state solution; PA and other Arab leaders insist on progress toward a Palestinian state at some point during such a transition. Israeli Prime Minister Benjamin Netanyahu has insisted that Israel have full security control of “all territory west of the Jordan River.”

In an October 2023 supplemental budget request, President Biden asked Congress to appropriate more than $14 billion in Israel-related funding, and more than $9 billion in global humanitarian assistance, with the Administration saying in February that it expected $1.4 billion of this funding to be provided for Gaza. Members of Congress have expressed differing views on the request and its various elements, and debate has increased in intensity over what level of oversight or conditionality (if any) to place on security assistance to Israel and humanitarian assistance to Gaza. The House and Senate have each passed bills and continue to deliberate on the way forward.

Download the document here.